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Preface 


This  text  is  designed  to  teach  the  concepts  and  techniques  of  basic  linear  algebra 
as  a rigorous  mathematical  subject.  Besides  computational  proficiency,  there  is  an 
emphasis  on  understanding  definitions  and  theorems,  as  well  as  reading,  understand- 
ing and  creating  proofs.  A strictly  logical  organization,  complete  and  exceedingly 
detailed  proofs  of  every  theorem,  advice  on  techniques  for  reading  and  writing  proofs, 
and  a selection  of  challenging  theoretical  exercises  will  slowly  provide  the  novice 
with  the  tools  and  confidence  to  be  able  to  study  other  mathematical  topics  in  a 
rigorous  fashion. 

Most  students  taking  a course  in  linear  algebra  will  have  completed  courses  in 
differential  and  integral  calculus,  and  maybe  also  multivariate  calculus,  and  will 
typically  be  second- year  students  in  university.  This  level  of  mathematical  maturity  is 
expected,  however  there  is  little  or  no  requirement  to  know  calculus  itself  to  use  this 
book  successfully.  With  complete  details  for  every  proof,  for  nearly  every  example, 
and  for  solutions  to  a majority  of  the  exercises,  the  book  is  ideal  for  self-study,  for 
those  of  any  age. 

While  there  is  an  abundance  of  guidance  in  the  use  of  the  software  system,  Sage, 
there  is  no  attempt  to  address  the  problems  of  numerical  linear  algebra,  which  are 
arguably  continuous  in  nature.  Similarly,  there  is  little  emphasis  on  a geometric 
approach  to  problems  of  linear  algebra.  While  this  may  contradict  the  experience  of 
many  experienced  mathematicians,  the  approach  here  is  consciously  algebraic.  As  a 
result,  the  student  should  be  well-prepared  to  encounter  groups,  rings  and  fields  in 
future  courses  in  algebra,  or  other  areas  of  discrete  mathematics. 

How  to  Use  This  Book 

While  the  book  is  divided  into  chapters,  the  main  organizational  unit  is  the  thirty- 
seven  sections.  Each  contains  a selection  of  definitions,  theorems,  and  examples 
interspersed  with  commentary.  If  you  are  enrolled  in  a course,  read  the  section  before 
class  and  then  answer  the  section’s  reading  questions  as  preparation  for  class. 

The  version  available  for  viewing  in  a web  browser  is  the  most  complete,  integrat- 
ing all  of  the  components  of  the  book.  Consider  acquainting  yourself  with  this  version. 
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Knowls  are  indicated  by  a dashed  underlines  and  will  allow  you  to  seamlessly  remind 
yourself  of  the  content  of  definitions,  theorems,  examples,  exercises,  subsections  and 
more.  Use  them  liberally. 

Historically,  mathematics  texts  have  numbered  definitions  and  theorems.  We 
have  instead  adopted  a strategy  more  appropriate  to  the  heavy  cross-referencing, 
linking  and  knowling  afforded  by  modern  media.  Mimicking  an  approach  taken  by 
Donald  Knuth,  we  have  given  items  short  titles  and  associated  acronyms.  You  will 
become  comfortable  with  this  scheme  after  a short  time,  and  might  even  come  to 
appreciate  its  inherent  advantages.  In  the  web  version,  each  chapter  has  a list  of  ten 
or  so  important  items  from  that  chapter,  and  you  will  find  yourself  recognizing  some 
of  these  acronyms  with  no  extra  effort  beyond  the  normal  amount  of  study.  Bruno 
Mello  suggests  that  some  say  an  acronym  should  be  pronounceable  as  a word  (such 
as  “radar”),  and  otherwise  is  an  abbreviation.  We  will  not  be  so  strict  in  our  use  of 
the  term. 

Exercises  come  in  three  flavors,  indicated  by  the  first  letter  of  their  label.  “C” 
indicates  a problem  that  is  essentially  computational.  “T”  represents  a problem 
that  is  more  theoretical,  usually  requiring  a solution  that  is  as  rigorous  as  a proof. 
“M”  stands  for  problems  that  are  “medium” , “moderate” , “midway” , “mediate”  or 
“median” , but  never  “mediocre.”  Their  statements  could  feel  computational,  but  their 
solutions  require  a more  thorough  understanding  of  the  concepts  or  theory,  while 
perhaps  not  being  as  rigorous  as  a proof.  Of  course,  such  a tripartite  division  will 
be  subject  to  interpretation.  Otherwise,  larger  numerical  values  indicate  greater 
perceived  difficulty,  with  gaps  allowing  for  the  contribution  of  new  problems  from 
readers.  Many,  but  not  all,  exercises  have  complete  solutions.  These  are  indicated 
by  daggers  in  the  PDF  and  print  versions,  with  solutions  available  in  an  online 
supplement,  while  in  the  web  version  a solution  is  indicated  by  a knowl  right  after  the 
problem  statement.  Resist  the  urge  to  peek  early.  Working  the  exercises  diligently  is 
the  best  way  to  master  the  material. 

The  Archetypes  are  a collection  of  twenty-four  archetypical  examples.  The  open 
source  lexical  database,  WordNet,  defines  an  archetype  as  “something  that  serves  as 
a model  or  a basis  for  making  copies.”  We  employ  the  word  in  the  first  sense  here. 
By  carefully  choosing  the  examples  we  hope  to  provide  at  least  one  example  that 
is  interesting  and  appropriate  for  many  of  the  theorems  and  definitions,  and  also 
provide  counterexamples  to  conjectures  (and  especially  counterexamples  to  converses 
of  theorems).  Each  archetype  has  numerous  computational  results  which  you  could 
strive  to  duplicate  as  you  encounter  new  definitions  and  theorems.  There  are  some 
exercises  which  will  help  guide  you  in  this  quest. 

Supplements 

Print  versions  of  the  book  (either  a physical  copy  or  a PDF  version)  have  significant 
material  available  as  supplements.  Solutions  are  contained  in  the  Exercise  Manual. 
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Advice  on  the  use  of  the  open  source  mathematical  software  system,  Sage,  is  contained 
in  another  supplement.  (Look  for  a linear  algebra  “Quick  Reference”  sheet  at  the 
Sage  website.)  The  Archetypes  are  available  in  a PDF  form  which  could  be  used 
as  a workbook.  Flashcards,  with  the  statement  of  every  definition  and  theorem,  in 
order  of  appearance,  are  also  available. 

Freedom 

This  book  is  copyrighted  by  its  author.  Some  would  say  it  is  his  “intellectual  property,” 
a distasteful  phrase  if  there  ever  was  one.  Rather  than  exercise  all  the  restrictions 
provided  by  the  government-granted  monopoly  that  is  copyright,  the  author  has 
granted  you  a license,  the  GNU  Free  Documentation  License  (GFDL).  In  summary 
it  says  you  may  receive  an  electronic  copy  at  no  cost  via  electronic  networks  and  you 
may  make  copies  forever.  So  your  copy  of  the  book  never  has  to  go  “out-of-print.” 
You  may  redistribute  copies  and  you  may  make  changes  to  your  copy  for  your  own 
use.  However,  you  have  one  major  responsibility  in  accepting  this  license.  If  you 
make  changes  and  distribute  the  changed  version,  then  you  must  offer  the  same 
license  for  the  new  version,  you  must  acknowledge  the  original  author’s  work,  and 
you  must  indicate  where  you  have  made  changes. 

In  practice,  if  you  see  a change  that  needs  to  be  made  (like  correcting  an  error, 
or  adding  a particularly  nice  theoretical  exercise),  you  may  just  wish  to  donate  the 
change  to  the  author  rather  than  create  and  maintain  a new  version.  Such  donations 
are  highly  encouraged  and  gratefully  accepted.  You  may  notice  the  large  number  of 
small  mistakes  that  have  been  corrected  by  readers  that  have  come  before  you.  Pay 
it  forward. 

So,  in  one  word,  the  book  really  is  “free”  (as  in  “no  cost”).  But  the  open  license 
employed  is  vastly  different  than  “free  to  download,  all  rights  reserved.”  Most 
importantly,  you  know  that  this  book,  and  its  ideas,  are  not  the  property  of  anyone. 
Or  they  are  the  property  of  everyone.  Either  way,  this  book  has  its  own  inherent 
“freedom,”  separate  from  those  who  contribute  to  it.  Much  of  this  philosophy  is 
embodied  in  the  following  quote: 

If  nature  has  made  any  one  thing  less  susceptible  than  all  others  of 
exclusive  property,  it  is  the  action  of  the  thinking  power  called  an  idea, 
which  an  individual  may  exclusively  possess  as  long  as  he  keeps  it  to 
himself;  but  the  moment  it  is  divulged,  it  forces  itself  into  the  possession 
of  every  one,  and  the  receiver  cannot  dispossess  himself  of  it.  Its  peculiar 
character,  too,  is  that  no  one  possesses  the  less,  because  every  other 
possesses  the  whole  of  it.  He  who  receives  an  idea  from  me,  receives 
instruction  himself  without  lessening  mine;  as  he  who  lights  his  taper 
at  mine,  receives  light  without  darkening  me.  That  ideas  should  freely 
spread  from  one  to  another  over  the  globe,  for  the  moral  and  mutual 
instruction  of  man,  and  improvement  of  his  condition,  seems  to  have 
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been  peculiarly  and  benevolently  designed  by  nature,  when  she  made 
them,  like  fire,  expansible  over  all  space,  without  lessening  their  density 
in  any  point,  and  like  the  air  in  which  we  breathe,  move,  and  have  our 
physical  being,  incapable  of  confinement  or  exclusive  appropriation. 

Thomas  Jefferson 
Letter  to  Isaac  McPherson 
August  13,  1813 


To  the  Instructor 

The  first  half  of  this  text  (through  Chapter  M)  is  a course  in  matrix  algebra,  though 
the  foundation  of  some  more  advanced  ideas  is  also  being  formed  in  these  early 
sections  (such  as  Theorem  NMUS,  which  presages  invertible  linear  transformations). 
Vectors  are  presented  exclusively  as  column  vectors  (not  transposes  of  row  vectors) , 
and  linear  combinations  are  presented  very  early.  Spans,  null  spaces,  column  spaces 
and  row  spaces  are  also  presented  early,  simply  as  sets,  saving  most  of  their  vector 
space  properties  for  later,  so  they  are  familiar  objects  before  being  scrutinized 
carefully. 

You  cannot  do  everything  early,  so  in  particular  matrix  multiplication  comes  later 
than  usual.  However,  with  a definition  built  on  linear  combinations  of  column  vectors, 
it  should  seem  more  natural  than  the  more  frequent  definition  using  dot  products 
of  rows  with  columns.  And  this  delay  emphasizes  that  linear  algebra  is  built  upon 
vector  addition  and  scalar  multiplication.  Of  course,  matrix  inverses  must  wait  for 
matrix  multiplication,  but  this  does  not  prevent  nonsingular  matrices  from  occurring 
sooner.  Vector  space  properties  are  hinted  at  when  vector  and  matrix  operations 
are  first  defined,  but  the  notion  of  a vector  space  is  saved  for  a more  axiomatic 
treatment  later  (Chapter  VS).  Once  bases  and  dimension  have  been  explored  in 
the  context  of  vector  spaces,  linear  transformations  and  their  matrix  representation 
follow.  The  predominant  purpose  of  the  book  is  the  four  sections  of  Chapter  R,  which 
introduces  the  student  to  representations  of  vectors  and  matrices,  change-of-basis, 
and  orthonormal  diagonalization  (the  spectral  theorem).  This  final  chapter  pulls 
together  all  the  important  ideas  of  the  previous  chapters. 

Our  vector  spaces  use  the  complex  numbers  as  the  field  of  scalars.  This  avoids 
the  fiction  of  complex  eigenvalues  being  used  to  form  scalar  multiples  of  eigenvectors. 
The  presence  of  the  complex  numbers  in  the  earliest  sections  should  not  frighten 
students  who  need  a review,  since  they  will  not  be  used  heavily  until  much  later, 
and  Section  CNO  provides  a quick  review. 

Linear  algebra  is  an  ideal  subject  for  the  novice  mathematics  student  to  learn  how 
to  develop  a subject  precisely,  with  all  the  rigor  mathematics  requires.  Unfortunately, 
much  of  this  rigor  seems  to  have  escaped  the  standard  calculus  curriculum,  so 
for  many  university  students  this  is  their  first  exposure  to  careful  definitions  and 
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theorems,  and  the  expectation  that  they  fully  understand  them,  to  say  nothing  of  the 
expectation  that  they  become  proficient  in  formulating  their  own  proofs.  We  have 
tried  to  make  this  text  as  helpful  as  possible  with  this  transition.  Every  definition 
is  stated  carefully,  set  apart  from  the  text.  Likewise,  every  theorem  is  carefully 
stated,  and  almost  every  one  has  a complete  proof.  Theorems  usually  have  just  one 
conclusion,  so  they  can  be  referenced  precisely  later.  Definitions  and  theorems  are 
cataloged  in  order  of  their  appearance  (Definitions  and  Theorems  in  the  Reference 
chapter  at  the  end  of  the  book).  Along  the  way,  there  are  discussions  of  some  more 
important  ideas  relating  to  formulating  proofs  (Proof  Techniques) , which  is  partly 
advice  and  partly  a primer  on  logic. 

Collecting  responses  to  the  Reading  Questions  prior  to  covering  material  in  class 
will  require  students  to  learn  how  to  read  the  material.  Sections  are  designed  to  be 
covered  in  a fifty-minute  lecture.  Later  sections  are  longer,  but  as  students  become 
more  proficient  at  reading  the  text,  it  is  possible  to  survey  these  longer  sections  at 
the  same  pace.  With  solutions  to  many  of  the  exercises,  students  may  be  given  the 
freedom  to  work  homework  at  their  own  pace  and  style  (individually,  in  groups,  with 
an  instructor’s  help,  etc.).  To  compensate  and  keep  students  from  falling  behind,  I 
give  an  examination  on  each  chapter. 

Sage  is  a powerful  open  source  program  for  advanced  mathematics.  It  is  especially 
robust  for  linear  algebra.  We  have  included  an  abundance  of  material  which  will  help 
the  student  (and  instructor)  learn  how  to  use  Sage  for  the  study  of  linear  algebra 
and  how  to  understand  linear  algebra  better  with  Sage.  This  material  is  tightly 
integrated  with  the  web  version  of  the  book  and  will  become  even  easier  to  use 
since  the  technology  for  interfaces  to  Sage  continues  to  rapidly  evolve.  Sage  is  highly 
capable  for  mathematical  research  as  well,  and  so  should  be  a tool  that  students  can 
use  in  subsequent  courses  and  careers. 

Conclusion 

Linear  algebra  is  a beautiful  subject.  I have  enjoyed  preparing  this  exposition  and 
making  it  widely  available.  Much  of  my  motivation  for  writing  this  book  is  captured 
by  the  sentiments  expressed  by  H.M.  Cundy  and  A.P.  Rollet  in  their  Preface  to  the 
First  Edition  of  Mathematical  Models  (1952),  especially  the  final  sentence, 

This  book  was  born  in  the  classroom,  and  arose  from  the  spontaneous 
interest  of  a Mathematical  Sixth  in  the  construction  of  simple  models.  A 
desire  to  show  that  even  in  mathematics  one  could  have  fun  led  to  an 
exhibition  of  the  results  and  attracted  considerable  attention  throughout 
the  school.  Since  then  the  Sherborne  collection  has  grown,  ideas  have  come 
from  many  sources,  and  widespread  interest  has  been  shown.  It  seems 
therefore  desirable  to  give  permanent  form  to  the  lessons  of  experience  so 
that  others  can  benefit  by  them  and  be  encouraged  to  undertake  similar 
work. 
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Foremost,  I hope  that  students  find  their  time  spent  with  this  book  profitable.  I 
hope  that  instructors  find  it  flexible  enough  to  fit  the  needs  of  their  course.  You  can 
always  find  the  latest  version,  and  keep  current  with  any  changes,  at  the  book’s  website 
(http://linear.pugetsound.edu).  I appreciate  receiving  suggestions,  corrections, 
and  other  comments,  so  please  do  contact  me. 


Robert  A.  Beezer 
Tacoma,  Washington 
December  2012 
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Chapter  SLE 

Systems  of  Linear  Equations 


We  will  motivate  our  study  of  linear  algebra  by  studying  solutions  to  systems  of 
linear  equations.  While  the  focus  of  this  chapter  is  on  the  practical  matter  of  how 
to  find,  and  describe,  these  solutions,  we  will  also  be  setting  ourselves  up  for  more 
theoretical  ideas  that  will  appear  later. 


Section  WILA 

What  is  Linear  Algebra? 

We  begin  our  study  of  linear  algebra  with  an  introduction  and  a motivational  example. 


Subsection  LA 
Linear  + Algebra 

The  subject  of  linear  algebra  can  be  partially  explained  by  the  meaning  of  the 
two  terms  comprising  the  title.  “Linear”  is  a term  you  will  appreciate  better  at 
the  end  of  this  course,  and  indeed,  attaining  this  appreciation  could  be  taken  as 
one  of  the  primary  goals  of  this  course.  However  for  now,  you  can  understand  it 
to  mean  anything  that  is  “straight”  or  “flat.”  For  example  in  the  xy-plane  you 
might  be  accustomed  to  describing  straight  lines  (is  there  any  other  kind?)  as 
the  set  of  solutions  to  an  equation  of  the  form  y = mx  + &,  where  the  slope  m 
and  the  ^-intercept  b are  constants  that  together  describe  the  line.  If  you  have 
studied  multivariate  calculus,  then  you  will  have  encountered  planes.  Living  in  three 
dimensions,  with  coordinates  described  by  triples  (x,  y,  z),  they  can  be  described 
as  the  set  of  solutions  to  equations  of  the  form  ax  + by  + cz  = d.  where  a,  b , c,  d 
are  constants  that  together  determine  the  plane.  While  we  might  describe  planes  as 
“flat,”  lines  in  three  dimensions  might  be  described  as  “straight.”  From  a multivariate 
calculus  course  you  will  recall  that  lines  are  sets  of  points  described  by  equations 
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such  as  x = 3t  — 4,  y = —7 1 + 2 , z = 9 1,  where  t is  a parameter  that  can  take  on  any 
value. 

Another  view  of  this  notion  of  “flatness”  is  to  recognize  that  the  sets  of  points 
just  described  are  solutions  to  equations  of  a relatively  simple  form.  These  equations 
involve  addition  and  multiplication  only.  We  will  have  a need  for  subtraction,  and 
occasionally  we  will  divide,  but  mostly  you  can  describe  “linear”  equations  as 
involving  only  addition  and  multiplication.  Here  are  some  examples  of  typical 
equations  we  will  see  in  the  next  few  sections: 

2x  + 3y  — Az  = 13  4cci  + 5x2  ~ *3  + £4  + £5  = 0 9a  — 2b  + 7c  + 2d  = —7 

What  we  will  not  see  are  equations  like: 

xy  + 5 yz  = 13  X\  + x^/x^  — x^x^x\  = 0 tan(a&)  + log(c  — d)  = —7 

The  exception  will  be  that  we  will  on  occasion  need  to  take  a square  root. 

You  have  probably  heard  the  word  “algebra”  frequently  in  your  mathematical 
preparation  for  this  course.  Most  likely,  you  have  spent  a good  ten  to  fifteen  years 
learning  the  algebra  of  the  real  numbers,  along  with  some  introduction  to  the  very 
similar  algebra  of  complex  numbers  (see  Section  CNO).  However,  there  are  many 
new  algebras  to  learn  and  use,  and  likely  linear  algebra  will  be  your  second  algebra. 
Like  learning  a second  language,  the  necessary  adjustments  can  be  challenging  at 
times,  but  the  rewards  are  many.  And  it  will  make  learning  your  third  and  fourth 
algebras  even  easier.  Perhaps  you  have  heard  of  “groups”  and  “rings”  (or  maybe 
you  have  studied  them  already),  which  are  excellent  examples  of  other  algebras  with 
very  interesting  properties  and  applications.  In  any  event,  prepare  yourself  to  learn 
a new  algebra  and  realize  that  some  of  the  old  rules  you  used  for  the  real  numbers 
may  no  longer  apply  to  this  new  algebra  you  will  be  learning! 

The  brief  discussion  above  about  lines  and  planes  suggests  that  linear  algebra 
has  an  inherently  geometric  nature,  and  this  is  true.  Examples  in  two  and  three 
dimensions  can  be  used  to  provide  valuable  insight  into  important  concepts  of  this 
course.  However,  much  of  the  power  of  linear  algebra  will  be  the  ability  to  work  with 
“flat”  or  “straight”  objects  in  higher  dimensions,  without  concerning  ourselves  with 
visualizing  the  situation.  While  much  of  our  intuition  will  come  from  examples  in 
two  and  three  dimensions,  we  will  maintain  an  algebraic  approach  to  the  subject, 
with  the  geometry  being  secondary.  Others  may  wish  to  switch  this  emphasis  around, 
and  that  can  lead  to  a very  fruitful  and  beneficial  course,  but  here  and  now  we  are 
laying  our  bias  bare. 

Subsection  AA 
An  Application 

We  conclude  this  section  with  a rather  involved  example  that  will  highlight  some  of 
the  power  and  techniques  of  linear  algebra.  Work  through  all  of  the  details  with  pencil 


§WIL  A 


Beezer:  A First  Course  in  Linear  Algebra 


3 


and  paper,  until  you  believe  all  the  assertions  made.  However,  in  this  introductory 
example,  do  not  concern  yourself  with  how  some  of  the  results  are  obtained  or  how 
you  might  be  expected  to  solve  a similar  problem.  We  will  come  back  to  this  example 
later  and  expose  some  of  the  techniques  used  and  properties  exploited.  For  now, 
use  your  background  in  mathematics  to  convince  yourself  that  everything  said  here 
really  is  correct. 

Example  TMP  Trail  Mix  Packaging 

Suppose  you  are  the  production  manager  at  a food-packaging  plant  and  one  of  your 
product  lines  is  trail  mix,  a healthy  snack  popular  with  hikers  and  backpackers, 
containing  raisins,  peanuts  and  hard-shelled  chocolate  pieces.  By  adjusting  the  mix 
of  these  three  ingredients,  you  are  able  to  sell  three  varieties  of  this  item.  The  fancy 
version  is  sold  in  half-kilogram  packages  at  outdoor  supply  stores  and  has  more 
chocolate  and  fewer  raisins,  thus  commanding  a higher  price.  The  standard  version 
is  sold  in  one  kilogram  packages  in  grocery  stores  and  gas  station  mini-markets. 
Since  the  standard  version  has  roughly  equal  amounts  of  each  ingredient,  it  is  not  as 
expensive  as  the  fancy  version.  Finally,  a bulk  version  is  sold  in  bins  at  grocery  stores 
for  consumers  to  load  into  plastic  bags  in  amounts  of  their  choosing.  To  appeal  to 
the  shoppers  that  like  bulk  items  for  their  economy  and  healthfulness,  this  mix  has 
many  more  raisins  (at  the  expense  of  chocolate)  and  therefore  sells  for  less. 

Your  production  facilities  have  limited  storage  space  and  early  each  morning 
you  are  able  to  receive  and  store  380  kilograms  of  raisins,  500  kilograms  of  peanuts 
and  620  kilograms  of  chocolate  pieces.  As  production  manager,  one  of  your  most 
important  duties  is  to  decide  how  much  of  each  version  of  trail  mix  to  make  every 
day.  Clearly,  you  can  have  up  to  1500  kilograms  of  raw  ingredients  available  each 
day,  so  to  be  the  most  productive  you  will  likely  produce  1500  kilograms  of  trail 
mix  each  day.  Also,  you  would  prefer  not  to  have  any  ingredients  leftover  each  day, 
so  that  your  final  product  is  as  fresh  as  possible  and  so  that  you  can  receive  the 
maximum  delivery  the  next  morning.  But  how  should  these  ingredients  be  allocated 
to  the  mixing  of  the  bulk,  standard  and  fancy  versions?  First,  we  need  a little  more 
information  about  the  mixes.  Workers  mix  the  ingredients  in  15  kilogram  batches, 
and  each  row  of  the  table  below  gives  a recipe  for  a 15  kilogram  batch.  There  is  some 
additional  information  on  the  costs  of  the  ingredients  and  the  price  the  manufacturer 
can  charge  for  the  different  versions  of  the  trail  mix. 


Raisins 

Peanuts 

Chocolate 

Cost 

Sale  Price 

(kg/batclr) 

(kg/batcli) 

(kg/batch) 

($/kg) 

(S/kg) 

Bulk 

7 

6 

2 

3.69 

4.99 

Standard 

6 

4 

5 

3.86 

5.50 

Fancy 

2 

5 

8 

4.45 

6.50 

Storage  (kg) 

380 

500 

620 

Cost  (SB/kg) 

2.55 

4.65 

4.80 

As  production  manager,  it  is  important  to  realize  that  you  only  have  three 
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decisions  to  make  — the  amount  of  bulk  mix  to  make,  the  amount  of  standard  mix  to 
make  and  the  amount  of  fancy  mix  to  make.  Everything  else  is  beyond  your  control 
or  is  handled  by  another  department  within  the  company.  Principally,  you  are  also 
limited  by  the  amount  of  raw  ingredients  you  can  store  each  day.  Let  us  denote  the 
amount  of  each  mix  to  produce  each  day,  measured  in  kilograms,  by  the  variable 
quantities  b , s and  /.  Your  production  schedule  can  be  described  as  values  of  b,  s 
and  / that  do  several  things.  First,  we  cannot  make  negative  quantities  of  each  mix, 
so 


b > 0 


s > 0 


/>  0 


Second,  if  we  want  to  consume  all  of  our  ingredients  each  day,  the  storage  capacities 
lead  to  three  (linear)  equations,  one  for  each  ingredient, 


7 , 6 

— b H s - 

15  15 

6 , 4 

— b s - 

15  15 

2 , 5 

— b 4 s - 

15  15 


T5/  = 38° 

hf  = 500 
= 620 


(raisins) 

(peanuts) 

(chocolate) 


It  happens  that  this  system  of  three  equations  has  just  one  solution.  In  other 
words,  as  production  manager,  your  job  is  easy,  since  there  is  but  one  way  to  use  up 
all  of  your  raw  ingredients  making  trail  mix.  This  single  solution  is 


b = 300  kg  s = 300  kg 


/ = 900  kg. 


We  do  not  yet  have  the  tools  to  explain  why  this  solution  is  the  only  one,  but  it 
should  be  simple  for  you  to  verify  that  this  is  indeed  a solution.  (Go  ahead,  we  will 
wait.)  Determining  solutions  such  as  this,  and  establishing  that  they  are  unique,  will 
be  the  main  motivation  for  our  initial  study  of  linear  algebra. 

So  we  have  solved  the  problem  of  making  sure  that  we  make  the  best  use  of 
our  limited  storage  space,  and  each  day  use  up  all  of  the  raw  ingredients  that  are 
shipped  to  us.  Additionally,  as  production  manager,  you  must  report  weekly  to  the 
CEO  of  the  company,  and  you  know  he  will  be  more  interested  in  the  profit  derived 
from  your  decisions  than  in  the  actual  production  levels.  So  you  compute, 

300(4.99  - 3.69)  + 300(5.50  - 3.86)  + 900(6.50  - 4.45)  = 2727.00 

for  a daily  profit  of  $2,727  from  this  production  schedule.  The  computation  of  the 
daily  profit  is  also  beyond  our  control,  though  it  is  definitely  of  interest,  and  it  too 
looks  like  a “linear”  computation. 

As  often  happens,  things  do  not  stay  the  same  for  long,  and  now  the  marketing 
department  has  suggested  that  your  company’s  trail  mix  products  standardize  on 
every  mix  being  one-third  peanuts.  Adjusting  the  peanut  portion  of  each  recipe  by 
also  adjusting  the  chocolate  portion  leads  to  revised  recipes,  and  slightly  different 
costs  for  the  bulk  and  standard  mixes,  as  given  in  the  following  table. 
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Raisins 

(kg/batch) 

Peanuts 

(kg/batclr) 

Chocolate 

(kg/batch) 

Cost 

(S/kg) 

Sale  Price 

(S/kg) 

Bulk 

7 

5 

3 

3.70 

4.99 

Standard 

6 

5 

4 

3.85 

5.50 

Fancy 

2 

5 

8 

4.45 

6.50 

Storage  (kg) 

380 

500 

620 

Cost  (SB/kg) 

2.55 

4.65 

4.80 

In  a similar  fashion  as  before,  we  desire  values  of  b,  s and  / so  that 


b > 0 


s > 0 


/>  0 


and 


7 , 6 

— b H s 

15  15 

5 , 5 

— b H s 

15  15 

3 , 4 

— b H s - 

15  15 


15  / = 380 
^ = 50° 
h /=62° 


(raisins) 

(peanuts) 

(chocolate) 


It  now  happens  that  this  system  of  equations  has  infinitely  many  solutions, 
as  we  will  now  demonstrate.  Let  / remain  a variable  quantity.  Then  suppose  we 
make  / kilograms  of  the  fancy  mix,  6 = 4/  — 3300  kilograms  of  the  bulk  mix,  and 
s = —5/  + 4800  kilograms  of  the  standard  mix.  We  now  show  that  these  choices, 
for  any  value  of  /,  will  yield  a production  schedule  that  exhausts  all  of  the  day’s 
supply  of  raw  ingredients.  (We  will  very  soon  learn  how  to  solve  systems  of  equations 
with  infinitely  many  solutions  and  then  determine  expressions  like  these  for  b and 
s ).  Grab  your  pencil  and  paper  and  play  along  by  substituting  these  choices  for  the 
production  schedule  into  the  storage  limits  for  each  raw  ingredient  and  simpliying 
the  algebra. 

^(4/  - 3300)  + ^(-5/  + 4800)  + A/  = 0/  + ^ = 380 

e;  K c:  7^00 

-(4/  - 3300)  + -(-5/  + 4800)  + -/  = 0/  + — = 500 

O A O QOQQ 

-(4/  - 3300)  + -(-5/  + 4800)  + -f  = Of  + — = 620 


Convince  yourself  that  these  expressions  for  b and  s allow  us  to  vary  f and  obtain 
an  infinite  number  of  possibilities  for  solutions  to  the  three  equations  that  describe 
our  storage  capacities.  As  a practical  matter,  there  really  are  not  an  infinite  number 
of  solutions,  since  we  are  unlikely  to  want  to  end  the  day  with  a fractional  number 
of  bags  of  fancy  mix,  so  our  allowable  values  of  / should  probably  be  integers.  More 
importantly,  we  need  to  remember  that  we  cannot  make  negative  amounts  of  each 
mix!  Where  does  this  lead  us?  Positive  quantities  of  the  bulk  mix  requires  that 


b > 0 =>  4/  - 3300  >0  =>  / > 825 
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Similarly  for  the  standard  mix, 

s > 0 =>  —5/  + 4800  >0  =>  / < 960 

So,  as  production  manager,  you  really  have  to  choose  a value  of  / from  the  finite 
set 

{825,  826,  . . . , 960} 

leaving  you  with  136  choices,  each  of  which  will  exhaust  the  day’s  supply  of  raw 
ingredients.  Pause  now  and  think  about  which  you  would  choose. 

Recalling  your  weekly  meeting  with  the  CEO  suggests  that  you  might  want  to 
choose  a production  schedule  that  yields  the  biggest  possible  profit  for  the  company. 
So  you  compute  an  expression  for  the  profit  based  on  your  as  yet  undetermined 
decision  for  the  value  of  /, 

(4/  - 3300)(4.99  - 3.70)  + (-5/  + 4800)(5.50  - 3.85)  + (/)(6.50  - 4.45) 

= -1.04/ + 3663 

Since  / has  a negative  coefficient  it  would  appear  that  mixing  fancy  mix  is 
detrimental  to  your  profit  and  should  be  avoided.  So  you  will  make  the  decision 
to  set  daily  fancy  mix  production  at  / = 825.  This  has  the  effect  of  setting  b = 
4(825)  — 3300  = 0 and  we  stop  producing  bulk  mix  entirely.  So  the  remainder  of  your 
daily  production  is  standard  mix  at  the  level  of  s = —5(825)  + 4800  = 675  kilograms 
and  the  resulting  daily  profit  is  (— 1.04)(825)  +3663  = 2805.  It  is  a pleasant  surprise 
that  daily  profit  has  risen  to  $2,805,  but  this  is  not  the  most  important  part  of  the 
story.  What  is  important  here  is  that  there  are  a large  number  of  ways  to  produce 
trail  mix  that  use  all  of  the  day’s  worth  of  raw  ingredients  and  you  were  able  to 
easily  choose  the  one  that  netted  the  largest  profit.  Notice  too  how  all  of  the  above 
computations  look  “linear.” 

In  the  food  industry,  things  do  not  stay  the  same  for  long,  and  now  the  sales 
department  says  that  increased  competition  has  led  to  the  decision  to  stay  competitive 
and  charge  just  $5.25  for  a kilogram  of  the  standard  mix,  rather  than  the  previous 
$5.50  per  kilogram.  This  decision  has  no  effect  on  the  possibilities  for  the  production 
schedule,  but  will  affect  the  decision  based  on  profit  considerations.  So  you  revisit 
just  the  profit  computation,  suitably  adjusted  for  the  new  selling  price  of  standard 
mix, 

(4/  - 3300)(4.99  - 3.70)  + (-5/  + 4800)(5.25  - 3.85)  + (/)(6.50  - 4.45) 

= 0.21 / + 2463 

Now  it  would  appear  that  fancy  mix  is  beneficial  to  the  company’s  profit  since 
the  value  of  / has  a positive  coefficient.  So  you  take  the  decision  to  make  as  much 
fancy  mix  as  possible,  setting  / = 960.  This  leads  to  s = —5(960)  +4800  = 0 and  the 
increased  competition  has  driven  you  out  of  the  standard  mix  market  all  together.  The 
remainder  of  production  is  therefore  bulk  mix  at  a daily  level  of  b = 4(960)  — 3300  = 
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540  kilograms  and  the  resulting  daily  profit  is  0.21(960)  + 2463  = 2664.60.  A daily 
profit  of  $2,664.60  is  less  than  it  used  to  be,  but  as  production  manager,  you  have 
made  the  best  of  a difficult  situation  and  shown  the  sales  department  that  the  best 
course  is  to  pull  out  of  the  highly  competitive  standard  mix  market  completely.  A 

This  example  is  taken  from  a field  of  mathematics  variously  known  by  names  such 
as  operations  research,  systems  science,  or  management  science.  More  specifically, 
this  is  a prototypical  example  of  problems  that  are  solved  by  the  techniques  of  “linear 
programming.” 

There  is  a lot  going  on  under  the  hood  in  this  example.  The  heart  of  the  matter  is 
the  solution  to  systems  of  linear  equations,  which  is  the  topic  of  the  next  few  sections, 
and  a recurrent  theme  throughout  this  course.  We  will  return  to  this  example  on 
several  occasions  to  reveal  some  of  the  reasons  for  its  behavior. 

Reading  Questions 

1.  Is  the  equation  x2  + xy  + tan(j/3)  = 0 linear  or  not?  Why  or  why  not? 

2.  Find  all  solutions  to  the  system  of  two  linear  equations  2x  + 3y  = — 8,  x — y = 6. 

3.  Describe  how  the  production  manager  might  explain  the  importance  of  the  procedures 
described  in  the  trail  mix  application  (Subsection  WILA.AA). 

Exercises 

CIO  In  Example  TMP  the  first  table  lists  the  cost  (per  kilogram)  to  manufacture  each  of 
the  three  varieties  of  trail  mix  (bulk,  standard,  fancy).  For  example,  it  costs  $3.69  to  make 
one  kilogram  of  the  bulk  variety.  Re-compute  each  of  these  three  costs  and  notice  that  the 
computations  are  linear  in  character. 

M70'  In  Example  TMP  two  different  prices  were  considered  for  marketing  standard  mix 
with  the  revised  recipes  (one-third  peanuts  in  each  recipe).  Selling  standard  mix  at  $5.50 
resulted  in  selling  the  minimum  amount  of  the  fancy  mix  and  no  bulk  mix.  At  $5.25  it 
was  best  for  profits  to  sell  the  maximum  amount  of  fancy  mix  and  then  sell  no  standard 
mix.  Determine  a selling  price  for  standard  mix  that  allows  for  maximum  profits  while  still 
selling  some  of  each  type  of  mix. 


Section  SSLE 

Solving  Systems  of  Linear  Equations 


We  will  motivate  our  study  of  linear  algebra  by  considering  the  problem  of  solving 
several  linear  equations  simultaneously.  The  word  “solve”  tends  to  get  abused 
somewhat,  as  in  “solve  this  problem.”  When  talking  about  equations  we  understand 
a more  precise  meaning:  find  all  of  the  values  of  some  variable  quantities  that  make 
an  equation,  or  several  equations,  simultaneously  true. 

Subsection  SLE 

Systems  of  Linear  Equations 

Our  first  example  is  of  a type  we  will  not  pursue  further.  While  it  has  two  equations, 
the  first  is  not  linear.  So  this  is  a good  example  to  come  back  to  later,  especially 
after  you  have  seen  Theorem  PSSLS. 

Example  STNE  Solving  two  (nonlinear)  equations 

Suppose  we  desire  the  simultaneous  solutions  of  the  two  equations, 

x2  + y2  = 1 

—x  + V3y  = 0 

You  can  easily  check  by  substitution  that  x = , y = | and  x = — ^ , y = — \ 

are  both  solutions.  We  need  to  also  convince  ourselves  that  these  are  the  only 
solutions.  To  see  this,  plot  each  equation  on  the  rry-plane,  which  means  to  plot  (x,  y) 
pairs  that  make  an  individual  equation  true.  In  this  case  we  get  a circle  centered  at 
the  origin  with  radius  1 and  a straight  line  through  the  origin  with  slope  ^ . The 
intersections  of  these  two  curves  are  our  desired  simultaneous  solutions,  and  so  we 
believe  from  our  plot  that  the  two  solutions  we  know  already  are  indeed  the  only 
ones.  We  like  to  write  solutions  as  sets,  so  in  this  case  we  write  the  set  of  solutions  as 


A 

In  order  to  discuss  systems  of  linear  equations  carefully,  we  need  a precise 
definition.  And  before  we  do  that,  we  will  introduce  our  periodic  discussions  about 
“Proof  Techniques.”  Linear  algebra  is  an  excellent  setting  for  learning  how  to  read, 
understand  and  formulate  proofs.  But  this  is  a difficult  step  in  your  development  as 
a mathematician,  so  we  have  included  a series  of  short  essays  containing  advice  and 
explanations  to  help  you  along.  These  will  be  referenced  in  the  text  as  needed,  and 
are  also  collected  as  a list  you  can  consult  when  you  want  to  return  to  re-read  them. 
(Which  is  strongly  encouraged!) 
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With  a definition  next,  now  is  the  time  for  the  first  of  our  proof  techniques.  So 
study  Proof  Technique  D.  We’ll  be  right  here  when  you  get  back.  See  you  in  a bit. 

Definition  SLE  System  of  Linear  Equations 

A system  of  linear  equations  is  a collection  of  m equations  in  the  variable 
quantities  aq,  X2,  £3, . . . , xn  of  the  form, 

onaq  + 0,12X2  + 013X3  + • • • + a\nxn  = 61 

O21X1  + 022X-2  + 023^3  + • • • + 02  nXn  = t>2 
O31X1  + 032X2  + <233^3  + • • • + 03nXn  = 63 

Oml^X  1 T Om2X2  T 0^3X3  T ‘ * * T OmnXn  = bm 

where  the  values  of  Ojj , hi  and  Xj,  1 < i < m,  1 < j < n,  are  from  the  set  of  complex 
numbers,  C.  □ 

Do  not  let  the  mention  of  the  complex  numbers,  C , rattle  you.  We  will  stick  with 
real  numbers  exclusively  for  many  more  sections,  and  it  will  sometimes  seem  like 
we  only  work  with  integers!  However,  we  want  to  leave  the  possibility  of  complex 
numbers  open,  and  there  will  be  occasions  in  subsequent  sections  where  they  are 
necessary.  You  can  review  the  basic  properties  of  complex  numbers  in  Section  CNO, 
but  these  facts  will  not  be  critical  until  we  reach  Section  O. 

Now  we  make  the  notion  of  a solution  to  a linear  system  precise. 

Definition  SSLE  Solution  of  a System  of  Linear  Equations 

A solution  of  a system  of  linear  equations  in  n variables,  aq , X2,  X3,  . . . , xn  (such 
as  the  system  given  in  Definition  SLE),  is  an  ordered  list  of  n complex  numbers, 
si,  S2,  S3,  . . . , sn  such  that  if  we  substitute  s 1 for  aq,  S2  for  X2 , S3  for  X3,  . . . , sn 
for  xn,  then  for  every  equation  of  the  system  the  left  side  will  equal  the  right  side, 
i.e.  each  equation  is  true  simultaneously.  □ 

More  typically,  we  will  write  a solution  in  a form  like  aq  = 12,  X2  = —7,  X3  = 2 to 
mean  that  si  = 12,  S2  = —7,  S3  = 2 in  the  notation  of  Definition  SSLE.  To  discuss 
all  of  the  possible  solutions  to  a system  of  linear  equations,  we  now  define  the  set 
of  all  solutions.  (So  Section  SET  is  now  applicable,  and  you  may  want  to  go  and 
familiarize  yourself  with  what  is  there.) 

Definition  SSSLE  Solution  Set  of  a System  of  Linear  Equations 

The  solution  set  of  a linear  system  of  equations  is  the  set  which  contains  every 

solution  to  the  system,  and  nothing  more.  □ 

Be  aware  that  a solution  set  can  be  infinite,  or  there  can  be  no  solutions,  in 
which  case  we  write  the  solution  set  as  the  empty  set,  0 = {}  (Definition  ES).  Here 
is  an  example  to  illustrate  using  the  notation  introduced  in  Definition  SLE  and  the 
notion  of  a solution  (Definition  SSLE). 
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Example  NSE  Notation  for  a system  of  equations 
Given  the  system  of  linear  equations, 

x\  + 2x2  + X4  = 7 
xi  + x2  + x3  — X4  = 3 
3xi  + x2  + 5x3  — 7x4  = 1 

we  have  n = 4 variables  and  m = 3 equations.  Also, 


an  = 1 

ai2  = 2 

013  = 0 

Ol4  = 1 

bi 

= 7 

a2i  = 1 

a22  = 1 

a2  3 = 1 

a24  = -1 

b2 

= 3 

031  = 3 

032  = 1 

a33  = 5 

034  = —7 

^3 

= 1 

Additionally, 

convince  yourself  that  xi  = 

-2,  x2  = 4,  x3 

= 2,  x4 

= 1 is  one 

solution  (Definition  SSLE),  but  it  is  not  the  only  one!  For  example,  another  solution 
is  X\  = —12,  x2  = 11,  X3  = 1,  X4  = —3,  and  there  are  more  to  be  found.  So  the 
solution  set  contains  at  least  two  elements.  A 

We  will  often  shorten  the  term  “system  of  linear  equations”  to  “system  of 
equations”  leaving  the  linear  aspect  implied.  After  all,  this  is  a book  about  linear 
algebra. 

Subsection  PSS 
Possibilities  for  Solution  Sets 

The  next  example  illustrates  the  possibilities  for  the  solution  set  of  a system  of  linear 
equations.  We  will  not  be  too  formal  here,  and  the  necessary  theorems  to  back  up 
our  claims  will  come  in  subsequent  sections.  So  read  for  feeling  and  come  back  later 
to  revisit  this  example. 

Example  TTS  Three  typical  systems 

Consider  the  system  of  two  equations  with  two  variables, 

2xi  + 3x2  = 3 
Xi  — x2  = 4 

If  we  plot  the  solutions  to  each  of  these  equations  separately  on  the  xix2-plane, 
we  get  two  lines,  one  with  negative  slope,  the  other  with  positive  slope.  They  have 
exactly  one  point  in  common,  (xi,  x2)  = (3,  —1),  which  is  the  solution  X\  = 3, 
x2  = —1.  From  the  geometry,  we  believe  that  this  is  the  only  solution  to  the  system 
of  equations,  and  so  we  say  it  is  unique. 

Now  adjust  the  system  with  a different  second  equation, 

2xi  + 3x2  = 3 
4xi  + 6x2  = 6 
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A plot  of  the  solutions  to  these  equations  individually  results  in  two  lines,  one  on 
top  of  the  other!  There  are  infinitely  many  pairs  of  points  that  make  both  equations 
true.  We  will  learn  shortly  how  to  describe  this  infinite  solution  set  precisely  (see 
Example  SAA,  Theorem  VFSLS).  Notice  now  how  the  second  equation  is  just  a 
multiple  of  the  first. 

One  more  minor  adjustment  provides  a third  system  of  linear  equations, 

2x  i + 3^2  = 3 
4a;  i + 6^2  = 10 

A plot  now  reveals  two  lines  with  identical  slopes,  i.e.  parallel  lines.  They  have 
no  points  in  common,  and  so  the  system  has  a solution  set  that  is  empty,  S = 0.  A 

This  example  exhibits  all  of  the  typical  behaviors  of  a system  of  equations.  A 
subsequent  theorem  will  tell  us  that  every  system  of  linear  equations  has  a solution 
set  that  is  empty,  contains  a single  solution  or  contains  infinitely  many  solutions 
(Theorem  PSSLS).  Example  STNE  yielded  exactly  two  solutions,  but  this  does  not 
contradict  the  forthcoming  theorem.  The  equations  in  Example  STNE  are  not  linear 
because  they  do  not  match  the  form  of  Definition  SLE,  and  so  we  cannot  apply 
Theorem  PSSLS  in  this  case. 

Subsection  ESEO 

Equivalent  Systems  and  Equation  Operations 

With  all  this  talk  about  finding  solution  sets  for  systems  of  linear  equations,  you 
might  be  ready  to  begin  learning  how  to  find  these  solution  sets  yourself.  We  begin 
with  our  first  definition  that  takes  a common  word  and  gives  it  a very  precise  meaning 
in  the  context  of  systems  of  linear  equations. 

Definition  ESYS  Equivalent  Systems 

Two  systems  of  linear  equations  are  equivalent  if  their  solution  sets  are  equal.  □ 

Notice  here  that  the  two  systems  of  equations  could  look  very  different  (i.e.  not 
be  equal),  but  still  have  equal  solution  sets,  and  we  would  then  call  the  systems 
equivalent.  Two  linear  equations  in  two  variables  might  be  plotted  as  two  lines 
that  intersect  in  a single  point.  A different  system,  with  three  equations  in  two 
variables  might  have  a plot  that  is  three  lines,  all  intersecting  at  a common  point, 
with  this  common  point  identical  to  the  intersection  point  for  the  first  system.  By  our 
definition,  we  could  then  say  these  two  very  different  looking  systems  of  equations 
are  equivalent,  since  they  have  identical  solution  sets.  It  is  really  like  a weaker  form 
of  equality,  where  we  allow  the  systems  to  be  different  in  some  respects,  but  we  use 
the  term  equivalent  to  highlight  the  situation  when  their  solution  sets  are  equal. 

With  this  definition,  we  can  begin  to  describe  our  strategy  for  solving  linear 
systems.  Given  a system  of  linear  equations  that  looks  difficult  to  solve,  we  would 
like  to  have  an  equivalent  system  that  is  easy  to  solve.  Since  the  systems  will  have 
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equal  solution  sets,  we  can  solve  the  “easy”  system  and  get  the  solution  set  to  the 
“difficult”  system.  Here  come  the  tools  for  making  this  strategy  viable. 

Definition  EO  Equation  Operations 

Given  a system  of  linear  equations,  the  following  three  operations  will  transform  the 
system  into  a different  one,  and  each  operation  is  known  as  an  equation  operation. 

1.  Swap  the  locations  of  two  equations  in  the  list  of  equations. 

2.  Multiply  each  term  of  an  equation  by  a nonzero  quantity. 

3.  Multiply  each  term  of  one  equation  by  some  quantity,  and  add  these  terms  to 
a second  equation,  on  both  sides  of  the  equality.  Leave  the  first  equation  the 
same  after  this  operation,  but  replace  the  second  equation  by  the  new  one. 


□ 

These  descriptions  might  seem  a bit  vague,  but  the  proof  or  the  examples  that 
follow  should  make  it  clear  what  is  meant  by  each.  We  will  shortly  prove  a key 
theorem  about  equation  operations  and  solutions  to  linear  systems  of  equations. 

We  are  about  to  give  a rather  involved  proof,  so  a discussion  about  just  what  a 
theorem  really  is  would  be  timely.  Stop  and  read  Proof  Technique  T first. 

In  the  theorem  we  are  about  to  prove,  the  conclusion  is  that  two  systems  are 
equivalent.  By  Definition  ESYS  this  translates  to  requiring  that  solution  sets  be 
equal  for  the  two  systems.  So  we  are  being  asked  to  show  that  two  sets  are  equal. 
How  do  we  do  this?  Well,  there  is  a very  standard  technique,  and  we  will  use  it 
repeatedly  through  the  course.  If  you  have  not  done  so  already,  head  to  Section  SET 
and  familiarize  yourself  with  sets,  their  operations,  and  especially  the  notion  of  set 
equality,  Definition  SE,  and  the  nearby  discussion  about  its  use. 

The  following  theorem  has  a rather  long  proof.  This  chapter  contains  a few  very 
necessary  theorems  like  this,  with  proofs  that  you  can  safely  skip  on  a first  reading. 
You  might  come  back  to  them  later,  when  you  are  more  comfortable  with  reading 
and  studying  proofs. 

Theorem  EOPSS  Equation  Operations  Preserve  Solution  Sets 
If  we  apply  one  of  the  three  equation  operations  of  Definition  EO  to  a system  of 
linear  equations  (Definition  SLE),  then  the  original  system  and  the  transformed 
system  are  equivalent. 

Proof.  We  take  each  equation  operation  in  turn  and  show  that  the  solution  sets  of 
the  two  systems  are  equal,  using  the  definition  of  set  equality  (Definition  SE). 

1.  It  will  not  be  our  habit  in  proofs  to  resort  to  saying  statements  are  “obvious,” 
but  in  this  case,  it  should  be.  There  is  nothing  about  the  order  in  which  we 
write  linear  equations  that  affects  their  solutions,  so  the  solution  set  will  be 
equal  if  the  systems  only  differ  by  a rearrangement  of  the  order  of  the  equations. 
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2.  Suppose  a 0 is  a number.  Let  us  choose  to  multiply  the  terms  of  equation  i 
by  a to  build  the  new  system  of  equations, 

anxi  + + ai3a;3  d h ai„a;n  = b\ 

a,2iX\  + CI22X2  + 023X3  d f a2nXn  = h 

a3iXi  + 0,32X2  + 033X3  d h a3nxn  = b3 

<y.Oi\X\  d-  cyOj 2X2  d-  ao^x^  d~  * * * d“  o/.Oinxn  — abi 


OmlX\  d-  Om 2X2  d-  Om 3X3  d“  * ' ' A OmnXn  — bm 

Let  S denote  the  solutions  to  the  system  in  the  statement  of  the  theorem,  and 
let  T denote  the  solutions  to  the  transformed  system. 

(a)  Show  S C T.  Suppose  (aq,  x2,  x3,  ...  ,xn)  = (ft,  ft,  ft,  ■ ■ ■ , ft)  G S is 
a solution  to  the  original  system.  Ignoring  the  i-th  equation  for  a moment, 
we  know  it  makes  all  the  other  equations  of  the  transformed  system  true. 
We  also  know  that 

Oil  ft  d-  Oi 2P2  + <b3ft  + •••-(-  OinPn  = bi 
which  we  can  multiply  by  a to  get 

aanPi  + aai2p2  d-  aaafa  d-  • • • d-  aOin/3n  = abi 

This  says  that  the  i- th  equation  of  the  transformed  system  is  also  true,  so 
we  have  established  that  (ft,  ft,  ft,  . . . , ft)  £ T,  and  therefore  S CT. 

(b)  Now  show  res.  Suppose  (aq,  x2,  x3,  ...  ,xn)  = (ft,  ft,  ft,  ■ ■ ■ , ft)  G 
T is  a solution  to  the  transformed  system.  Ignoring  the  *-th  equation 
for  a moment,  we  know  it  makes  all  the  other  equations  of  the  original 
system  true.  We  also  know  that 

aanPi  + cra^ft  + CMb3ft  + ■ ■ ■ + cra,„ft  = abi 
which  we  can  multiply  by  d- , since  a / 0,  to  get 

Oilfii  d^ft  + di3/53  -+-•••+  dinft  = bi 


This  says  that  the  i-tli  equation  of  the  original  system  is  also  true,  so 
we  have  established  that  (/5i , /3 2,  ft,  • • • , ft)  G ft  and  therefore  T C ft 
Locate  the  key  point  where  we  required  that  a/0,  and  consider  what 
would  happen  if  a = 0. 
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3.  Suppose  a is  a number.  Let  us  choose  to  multiply  the  terms  of  equation  i by 
a and  add  them  to  equation  j in  order  to  build  the  new  system  of  equations, 

auXi  + a12x2  H b ainxn  = ft 

a2 1X1  + a22x2  H b a2nxn  = b2 

a31xi  + a32x2  H b a3nxn  = b3 

{ota.il  + aji)xi  + (<a®i2  + aj2)x2  + • • ■ + {otain  + ajn)xn  = abi  + bj 
amiXi  ~b  am2x2  "b  * * * “b  = bm 

Let  S denote  the  solutions  to  the  system  in  the  statement  of  the  theorem,  and 
let  T denote  the  solutions  to  the  transformed  system. 

(a)  Show  S CT.  Suppose  (aq,  x2,  x3 , . . . , xn ) = (ft,  ft,  ft,  . . . , ft)  e S is 
a solution  to  the  original  system.  Ignoring  the  j- th  equation  for  a moment, 
we  know  this  solution  makes  all  the  other  equations  of  the  transformed 
system  true.  Using  the  fact  that  the  solution  makes  the  i-th  and  j-th 
equations  of  the  original  system  true,  we  find 

(era*!  + aji)ft  + (aiaj2  + cij2)ft  + • • • + {otain  + ajn)/3n 

= (aanpi  + aa,2ft  + • • • + Qtainf3n)  + (a^ift  + aj2/32  + • • • + ajn/3n ) 

= Qftilft  + «i2ft  + ' ' ' + dinft)  + (aj lft  + aj2[32  + ■ ■ ■ + ajn(3n) 

= abi  + bj . 

This  says  that  the  j-th  equation  of  the  transformed  system  is  also  true,  so 
we  have  established  that  (ft,  ft,  ft,  . . . , ft)  € T,  and  therefore  S CT. 

(b)  Now  show  T C S.  Suppose  (aq,  x2,  x3,  . . . , xn)  = (ft , ft , ft,  ... , ft)  G 
T is  a solution  to  the  transformed  system.  Ignoring  the  j- th  equation 
for  a moment,  we  know  it  makes  all  the  other  equations  of  the  original 
system  true.  We  then  find 

aj  ift  + • • • + ajn/3n 

= aj ift  H b ajnl3n  + abi  - abi 

= djift  + • • • + aj  n /ft  + (aaiift  + • • • + aainft)  — abi 
Uji/ft  ~b  aanfSi  ~b  * ■ ■ ~b  ajnPn  ~b  aa.infln  abi 
— (ttfl, i ~b  aj\}(3\  + • • • + {aain  ~b  ajn^(3n  abi 
= abi  + bj  — abi 
= bj 
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This  says  that  the  j- th  equation  of  the  original  system  is  also  true,  so  we 
have  established  that  (/3i,  @2,  /33,  ■ ■ ■ ,0n)  G S , and  therefore  T C S. 

Why  did  we  not  need  to  require  that  a 7^  0 for  this  row  operation?  In  other 
words,  how  does  the  third  statement  of  the  theorem  read  when  a = 0?  Does 
our  proof  require  some  extra  care  when  a = 0?  Compare  your  answers  with 
the  similar  situation  for  the  second  row  operation.  (See  Exercise  SSLE.T20.) 


Theorem  EOPSS  is  the  necessary  tool  to  complete  our  strategy  for  solving  systems 
of  equations.  We  will  use  equation  operations  to  move  from  one  system  to  another, 
all  the  while  keeping  the  solution  set  the  same.  With  the  right  sequence  of  operations, 
we  will  arrive  at  a simpler  equation  to  solve.  The  next  two  examples  illustrate  this 
idea,  while  saving  some  of  the  details  for  later. 


Example  US  Three  equations,  one  solution 

We  solve  the  following  system  by  a sequence  of  equation  operations. 

x\  + 2x2  A 2x3  = 4 
x\  + 3x2  A 3x3  = 5 
2xi  + 6x2  A 5x3  = 6 

a = — 1 times  equation  1,  add  to  equation  2: 

X\  -\~  2x2  A 2x3  = 4 
Oxi  + 1x2  + lx3  = 1 
2xi  + 6x2  + 5x3  = 6 

a = —2  times  equation  1,  add  to  equation  3: 

X\  2x2  A 2x3  = 4 
Oxi  + lx2  A lx3  = 1 
Oxi  + 2x2  A lx3  = —2 

a = —2  times  equation  2,  add  to  equation  3: 

X\  T 2x2  A 2x3  = 4 
Oxi  + lx2  A lx3  = 1 
Oxi  + 0x2  — lx3  = —4 

xi  + 2x2  A 2x3  = 4 
Oxi  + lx2  A lx3  = 1 


a = — 1 times  equation  3: 
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Oxi  + 0x2  + IX3  = 4 
which  can  be  written  more  clearly  as 

X\  A 2x2  A 2x3  = 4 
X2  A X3  = 1 
X3  = 4 

This  is  now  a very  easy  system  of  equations  to  solve.  The  third  equation  requires 
that  X3  = 4 to  be  true.  Making  this  substitution  into  equation  2 we  arrive  at  X2  = —3, 
and  finally,  substituting  these  values  of  X2  and  X3  into  the  first  equation,  we  find 
that  Xi  = 2.  Note  too  that  this  is  the  only  solution  to  this  final  system  of  equations, 
since  we  were  forced  to  choose  these  values  to  make  the  equations  true.  Since  we 
performed  equation  operations  on  each  system  to  obtain  the  next  one  in  the  list,  all 
of  the  systems  listed  here  are  all  equivalent  to  each  other  by  Theorem  EOPSS.  Thus 
(xi,  X2,  X3)  = (2,  —3,4)  is  the  unique  solution  to  the  original  system  of  equations 
(and  all  of  the  other  intermediate  systems  of  equations  listed  as  we  transformed  one 
into  another).  A 

Example  IS  Three  equations,  infinitely  many  solutions 

The  following  system  of  equations  made  an  appearance  earlier  in  this  section  (Ex- 
ample NSE),  where  we  listed  one  of  its  solutions.  Now,  we  will  try  to  find  all  of  the 
solutions  to  this  system.  Do  not  concern  yourself  too  much  about  why  we  choose 
this  particular  sequence  of  equation  operations,  just  believe  that  the  work  we  do  is 
all  correct. 

Xi  + 2x2  A OX3  + X4  = 7 
xi  + x2  + x3  — X4  = 3 
3xi  + X2  + 5x3  — 7x4  = 1 

a = — 1 times  equation  1,  add  to  equation  2: 

Xl  + 2X2  + OX3  + X4  = 7 
Oxi  — X2  + X3  — 2x4  = —4 
3xi  + X2  + 5x3  ~ 7x4  = 1 

a = — 3 times  equation  1,  add  to  equation  3: 

Xi  + 2X2  + 0x3  + X4  = 7 
Oxi  — X2  + X3  — 2x4  = —4 
Oxi  — 5x2  + 5x3  — IOX4  = —20 
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a = — 5 times  equation  2,  add  to  equation  3: 

X\  + 2X2  + OX3  + X4  = 7 
Oxi  — X2  + x$  — 2x4  = —4 
Oxi  + 0x2  + 0^3  + OX4  = 0 

a = — 1 times  equation  2: 

Xl  + 2X2  + OX3  + X4  = 7 
Oxi  + X2  — X3  + 2x4  = 4 
Oxi  + 0X2  + OX3  + OX4  = 0 

a = —2  times  equation  2,  add  to  equation  1: 

Xi  + 0X2  + 2x3  ~ 3x4  = — 1 
Oxi  + X2  — X3  + 2x4  = 4 
Oxi  + 0X2  + OX3  + OX4  = 0 

which  can  be  written  more  clearly  as 

Xi  + 2x3  — 3x4  = — 1 
X2  — x3  + 2x4  = 4 
0 = 0 

What  does  the  equation  0 = 0 mean?  We  can  choose  any  values  for  xi,  X2, 
X3,  X4  and  this  equation  will  be  true,  so  we  only  need  to  consider  further  the  first 
two  equations,  since  the  third  is  true  no  matter  what.  We  can  analyze  the  second 
equation  without  consideration  of  the  variable  xi.  It  would  appear  that  there  is 
considerable  latitude  in  how  we  can  choose  X2,  x3,  X4  and  make  this  equation  true. 
Let  us  choose  x3  and  X4  to  be  anything  we  please,  say  x3  = a and  X4  = b. 

Now  we  can  take  these  arbitrary  values  for  X3  and  X4,  substitute  them  in  equation 
1,  to  obtain 

Xi  + 2a  — 36  = —1 

Xi  = — 1 — 2a  A 36 


Similarly,  equation  2 becomes 

X2  — a + 26  = 4 

X2  = 4 + a — 26 

So  our  arbitrary  choices  of  values  for  X3  and  X4  (a  and  6)  translate  into  specific 
values  of  Xi  and  X2.  The  lone  solution  given  in  Example  NSE  was  obtained  by 
choosing  a = 2 and  6=1.  Now  we  can  easily  and  quickly  find  many  more  (infinitely 
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more).  Suppose  we  choose  a = 5 and  b = —2,  then  we  compute 

xt  = —1  — 2(5)  + 3(-2)  = -17 
x2  = 4 + 5 - 2(— 2)  = 13 

and  you  can  verify  that  (x\,  x% , X3,  X4)  = (—17,  13,  5,  —2)  makes  all  three  equations 
true.  The  entire  solution  set  is  written  as 

S = { (—1  — 2a  + 36,  4 + a — 26,  a,  6)|  a £ C,  6 € C} 

It  would  be  instructive  to  finish  off  your  study  of  this  example  by  taking  the 
general  form  of  the  solutions  given  in  this  set  and  substituting  them  into  each  of  the 
three  equations  and  verify  that  they  are  true  in  each  case  (Exercise  SSLE.M40).  A 

In  the  next  section  we  will  describe  how  to  use  equation  operations  to  systemati- 
cally solve  any  system  of  linear  equations.  But  first,  read  one  of  our  more  important 
pieces  of  advice  about  speaking  and  writing  mathematics.  See  Proof  Technique  L. 

Before  attacking  the  exercises  in  this  section,  it  will  be  helpful  to  read  some 
advice  on  getting  started  on  the  construction  of  a proof.  See  Proof  Technique  GS. 

Reading  Questions 

1.  How  many  solutions  does  the  system  of  equations  3x  + 2y  = 4,  6x  + 4y  = 8 have?  Explain 
your  answer. 

2.  How  many  solutions  does  the  system  of  equations  3x  + 2 y = 4,  6x  + Ay  = —2  have? 
Explain  your  answer. 

3.  What  do  we  mean  when  we  say  mathematics  is  a language? 

Exercises 

CIO  Find  a solution  to  the  system  in  Example  IS  where  *3  = 6 and  X4  = 2.  Find  two 
other  solutions  to  the  system.  Find  a solution  where  xi  = —17  and  X2  = 14.  How  many 
possible  answers  are  there  to  each  of  these  questions? 

C20  Each  archetype  (Archetypes)  that  is  a system  of  equations  begins  by  listing  some 
specific  solutions.  Verify  the  specific  solutions  listed  in  the  following  archetypes  by  evaluat- 
ing the  system  of  equations  with  the  solutions  listed. 

Archetype  A,  Archetype  B,  Archetype  C,  Archetype  D,  Archetype  E,  Archetype  F,  Archetype 
G,  Archetype  H,  Archetype  I,  Archetype  J 
C30'  Find  all  solutions  to  the  linear  system: 

x + y = 5 
2x  — y = 3 

C31  Find  all  solutions  to  the  linear  system: 


3*  + 2y  = 1 
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x-y  = 2 
Ax  + 2y  = 2 

C32  Find  all  solutions  to  the  linear  system: 

* + 2y  = 8 
x-y  = 2 
x + y = 4 

C33  Find  all  solutions  to  the  linear  system: 

x + y — z = — 1 
x — y — z = — 1 
2 = 2 

C34  Find  all  solutions  to  the  linear  system: 

x + y — z = — 5 
x — y — z = —3 
x + y — z = 0 

C50'  A three-digit  number  has  two  properties.  The  tens-digit  and  the  ones-digit  add  up 
to  5.  If  the  number  is  written  with  the  digits  in  the  reverse  order,  and  then  subtracted 
from  the  original  number,  the  result  is  792.  Use  a system  of  equations  to  find  all  of  the 
three-digit  numbers  with  these  properties. 

C5F  Find  all  of  the  six-digit  numbers  in  which  the  first  digit  is  one  less  than  the  second, 
the  third  digit  is  half  the  second,  the  fourth  digit  is  three  times  the  third  and  the  last  two 
digits  form  a number  that  equals  the  sum  of  the  fourth  and  fifth.  The  sum  of  all  the  digits 
is  24.  (From  The  MENSA  Puzzle  Calendar  for  January  9,  2006.) 

C52'  Driving  along,  Terry  notices  that  the  last  four  digits  on  his  car’s  odometer  are 
palindromic.  A mile  later,  the  last  five  digits  are  palindromic.  After  driving  another  mile,  the 
middle  four  digits  are  palindromic.  One  more  mile,  and  all  six  are  palindromic.  What  was 
the  odometer  reading  when  Terry  first  looked  at  it?  Form  a linear  system  of  equations  that 
expresses  the  requirements  of  this  puzzle.  (Car  Talk  Puzzler,  National  Public  Radio,  Week 
of  January  21,  2008)  (A  car  odometer  displays  six  digits  and  a sequence  is  a palindrome 
if  it  reads  the  same  left-to-right  as  right-to-left.) 

C53^  An  article  in  The  Economist  (“Free  Exchange”,  December  6,  2014)  quotes  the 
following  problem  as  an  illustration  that  some  of  the  “underlying  assumptions  of  classical 
economics”  about  people’s  behavior  are  incorrect  and  “the  mind  plays  tricks.”  A bat  and 
ball  cost  $1.10  between  them.  How  much  does  each  cost?  Answer  this  quickly  with  no 
writing,  then  construct  system  of  linear  equations  and  solve  the  problem  carefully. 

M10f  Each  sentence  below  has  at  least  two  meanings.  Identify  the  source  of  the  double 
meaning,  and  rewrite  the  sentence  (at  least  twice)  to  clearly  convey  each  meaning. 


1.  They  are  baking  potatoes. 
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2.  He  bought  many  ripe  pears  and  apricots. 

3.  She  likes  his  sculpture. 

4.  I decided  on  the  bus. 

Milt  Discuss  the  difference  in  meaning  of  each  of  the  following  three  almost  identical 
sentences,  which  all  have  the  same  grammatical  structure.  (These  are  due  to  Keith  Devlin.) 

1.  She  saw  him  in  the  park  with  a dog. 

2.  She  saw  him  in  the  park  with  a fountain. 

3.  She  saw  him  in  the  park  with  a telescope. 

M12^  The  following  sentence,  due  to  Noam  Chomsky,  has  a correct  grammatical  structure, 

but  is  meaningless.  Critique  its  faults.  “Colorless  green  ideas  sleep  furiously.”  (Chomsky, 
Noam.  Syntactic  Structures,  The  Hague/Paris:  Mouton,  1957.  p.  15.) 

1X413^  Read  the  following  sentence  and  form  a mental  picture  of  the  situation. 

The  baby  cried  and  the  mother  picked  it  up. 

What  assumptions  did  you  make  about  the  situation? 

M14  Discuss  the  difference  in  meaning  of  the  following  two  almost  identical  sentences, 
which  have  nearly  identical  grammatical  structure.  (This  antanaclasis  is  often  attributed  to 
the  comedian  Groucho  Marx,  but  has  earlier  roots.) 

1.  Time  flies  like  an  arrow. 

2.  Fruit  flies  like  a banana. 

M3CL  This  problem  appears  in  a middle-school  mathematics  textbook:  Together  Dan 
and  Diane  have  $20.  Together  Diane  and  Donna  have  $15.  How  much  do  the  three  of  them 
have  in  total?  ( Transition  Mathematics,  Second  Edition,  Scott  Foresman  Addison  Wesley, 
1998.  Problem  5-1.19.) 

M40  Solutions  to  the  system  in  Example  IS  are  given  as 

(xi,  X2,  X3,  £4)  = (—1  — 2a  + 36,  4 + a — 2b,  a,  b) 

Evaluate  the  three  equations  of  the  original  system  with  these  expressions  in  a and  b and 
verify  that  each  equation  is  true,  no  matter  what  values  are  chosen  for  a and  b. 

M70f  We  have  seen  in  this  section  that  systems  of  linear  equations  have  limited  possi- 
bilities for  solution  sets,  and  we  will  shortly  prove  Theorem  PSSLS  that  describes  these 
possibilities  exactly.  This  exercise  will  show  that  if  we  relax  the  requirement  that  our  equa- 
tions be  linear,  then  the  possibilities  expand  greatly.  Consider  a system  of  two  equations  in 
the  two  variables  x and  y,  where  the  departure  from  linearity  involves  simply  squaring  the 
variables. 


§SSLE 


Beezer:  A First  Course  in  Linear  Algebra 


21 


2,2  , 

x + y =4 

After  solving  this  system  of  nonlinear  equations,  replace  the  second  equation  in  turn  by 
x2  + 2 x + y2  = 3,  x2  + y2  = 1,  x2  — 4x  + y2  = —3,  —x2  + y2  = 1 and  solve  each  resulting 
system  of  two  equations  in  two  variables.  (This  exercise  includes  suggestions  from  Don 
Kreher.) 

TICL  Proof  Technique  D asks  you  to  formulate  a definition  of  what  it  means  for  a whole 
number  to  be  odd.  What  is  your  definition?  (Do  not  say  “the  opposite  of  even.”)  Is  6 odd? 
Is  11  odd?  Justify  your  answers  by  using  your  definition. 

T2(L  Explain  why  the  second  equation  operation  in  Definition  EO  requires  that  the 
scalar  be  nonzero,  while  in  the  third  equation  operation  this  restriction  on  the  scalar  is  not 
present. 


Section  RREF 

Reduced  Row-Echelon  Form 


After  solving  a few  systems  of  equations,  you  will  recognize  that  it  does  not  matter  so 
much  what  we  call  our  variables,  as  opposed  to  what  numbers  act  as  their  coefficients. 
A system  in  the  variables  aq,  aq , aq  would  behave  the  same  if  we  changed  the  names 
of  the  variables  to  a,  b , c and  kept  all  the  constants  the  same  and  in  the  same  places. 
In  this  section,  we  will  isolate  the  key  bits  of  information  about  a system  of  equations 
into  something  called  a matrix,  and  then  use  this  matrix  to  systematically  solve 
the  equations.  Along  the  way  we  will  obtain  one  of  our  most  important  and  useful 
computational  tools. 

Subsection  MVNSE 

Matrix  and  Vector  Notation  for  Systems  of  Equations 

Definition  M Matrix 

An  m x n matrix  is  a rectangular  layout  of  numbers  from  C having  m rows  and 
n columns.  We  will  use  upper-case  Latin  letters  from  the  start  of  the  alphabet 
(A,  B , C, . . . ) to  denote  matrices  and  squared-off  brackets  to  delimit  the  layout. 
Many  use  large  parentheses  instead  of  brackets  — the  distinction  is  not  important. 
Rows  of  a matrix  will  be  referenced  starting  at  the  top  and  working  down  (i.e.  row  1 
is  at  the  top)  and  columns  will  be  referenced  starting  from  the  left  (i.e.  column  1 is 
at  the  left).  For  a matrix  A,  the  notation  [A]-  will  refer  to  the  complex  number  in 
row  i and  column  j of  A.  □ 

Be  careful  with  this  notation  for  individual  entries,  since  it  is  easy  to  think  that 
[A] refers  to  the  whole  matrix.  It  does  not.  It  is  just  a number,  but  is  a convenient 
way  to  talk  about  the  individual  entries  simultaneously.  This  notation  will  get  a 
heavy  workout  once  we  get  to  Chapter  M. 

Example  AM  A matrix 


-1 

2 

5 

3 ' 

B = 

1 

0 

-6 

1 

-4 

2 

2 

-2 

is  a matrix  with  m = 3 rows  and  n = 4 columns.  We  can  say  that  [B]0  3 = —6  while 
[B]  3,4  = -2-  ’ A 

When  we  do  equation  operations  on  system  of  equations,  the  names  of  the 
variables  really  are  not  very  important.  Use  oq,  X2 , £3,  or  a,  b , c,  or  x,  y,  z,  it  really 
does  not  matter.  In  this  subsection  we  will  describe  some  notation  that  will  make  it 
easier  to  describe  linear  systems,  solve  the  systems  and  describe  the  solution  sets. 
Here  is  a list  of  definitions,  laden  with  notation. 
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Definition  CV  Column  Vector 

A column  vector  of  size  m is  an  ordered  list  of  m numbers,  which  is  written  in 
order  vertically,  starting  at  the  top  and  proceeding  to  the  bottom.  At  times,  we  will 
refer  to  a column  vector  as  simply  a vector.  Column  vectors  will  be  written  in  bold, 
usually  with  lower  case  Latin  letter  from  the  end  of  the  alphabet  such  as  u,  v,  w, 
x,  y,  z.  Some  books  like  to  write  vectors  with  arrows,  such  as  u.  Writing  by  hand, 
some  like  to  put  arrows  on  top  of  the  symbol,  or  a tilde  underneath  the  symbol,  as 
in  u.  To  refer  to  the  entry  or  component  of  vector  v in  location  i of  the  list,  we 

write  [v]i.  □ 

Be  careful  with  this  notation.  While  the  symbols  [v]  - might  look  somewhat 
substantial,  as  an  object  this  represents  just  one  entry  of  a vector,  which  is  just  a 
single  complex  number. 

Definition  ZCV  Zero  Column  Vector 

The  zero  vector  of  size  m is  the  column  vector  of  size  m where  each  entry  is  the 
number  zero, 

'O' 

0 


Loj 

or  defined  much  more  compactly,  [0]i  = 0 for  1 < i < m. 

Definition  CM  Coefficient  Matrix 
For  a system  of  linear  equations, 

&11*1  + 0 12^2  + dl3x3  + • • • + ainxn 

a2 1*1  + 0222:2  + 0232:3  H 1-  a2nxn 

0312:1  + 0322:2  + 0332:3  H 1-  a3nxn 

Umixi  + am22;2  + am3x3  + • • • + amnxn  = bm 


the  coefficient  matrix  is  the  m x 

n matrix 

Oil 

012 

Ol3 

d\n 

021 

a22 

023 

a2n 

A = 

031 

a32 

O33 

^3  n 

_Oml 

Om2 

Om3  ■ ■ 

& mn . 

= b 1 
= 62 
= b3 


Definition  VOC  Vector  of  Constants 
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For  a system  of  linear  equations, 

aux\  + a12x2  + CI13X3  H h ainxn  = 61 

«2iXi  + <222X2  + 0232:3  H 1-  a2nxn  = b2 

0312:1  + 0322:2  + 0332:3  + • • • + a,3nxn  = 63 


Oml2:i  “t“  Om 2X2  ~\~  dm 3X3  T * * * T omn xn  — bm 
the  vector  of  constants  is  the  column  vector  of  size  m 


r fen 

b2 


Definition  SOLV  Solution  Vector 
For  a system  of  linear  equations, 


anxi  + ai2x2  + <2132:3  + • • 

' + <2lnX„  = 61 

<221X1  + d22X2  + <223X3  + • • 

• + d2nxn  = b2 

<231X1  + a32x2  + <2332:3  + • • 

■ + d3nX„  = 63 

Oml2:i  A drn2X2  ~\~  dm3%3  “ F * * * — OmnXn  — bm 

the  solution  vector  is  the  column  vector  of  size  n 


Xi 

X2 

X3 


X 


n J 


□ 


□ 


The  solution  vector  may  do  double-duty  on  occasion.  It  might  refer  to  a list  of 
variable  quantities  at  one  point,  and  subsequently  refer  to  values  of  those  variables 
that  actually  form  a particular  solution  to  that  system. 

Definition  MRLS  Matrix  Representation  of  a Linear  System 
If  A is  the  coefficient  matrix  of  a system  of  linear  equations  and  b is  the  vector  of 
constants,  then  we  will  write  CS(A,  b)  as  a shorthand  expression  for  the  system  of 
linear  equations,  which  we  will  refer  to  as  the  matrix  representation  of  the  linear 
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system.  □ 

Example  NSLE  Notation  for  systems  of  linear  equations 
The  system  of  linear  equations 


2xi  + 4X2  ~ 3x3  + 5X4  + X5  = 9 
3xi  + X2  + X4  — 3x5  = 0 
— 2xi  + 7x2  ^ 5x3  + 2x4  + 2x5  = —3 


has  coefficient  matrix 


and  vector  of  constants 


A = 


' 2 4 
3 1 
-2  7 


-3  5 1 ' 

0 1 -3 

-5  2 2 


and  so  will  be  referenced  as  CS(A,  b).  A 

Definition  AM  Augmented  Matrix 

Suppose  we  have  a system  of  to  equations  in  n variables,  with  coefficient  matrix  A 
and  vector  of  constants  b.  Then  the  augmented  matrix  of  the  system  of  equations 
is  the  to  x (n  + 1)  matrix  whose  first  n columns  are  the  columns  of  A and  whose  last 
column  (n  + 1)  is  the  column  vector  b.  This  matrix  will  be  written  as  [ A | b],  □ 


The  augmented  matrix  represents  all  the  important  information  in  the  system  of 
equations,  since  the  names  of  the  variables  have  been  ignored,  and  the  only  connection 
with  the  variables  is  the  location  of  their  coefficients  in  the  matrix.  It  is  important 
to  realize  that  the  augmented  matrix  is  just  that,  a matrix,  and  not  a system  of 
equations.  In  particular,  the  augmented  matrix  does  not  have  any  “solutions,”  though 
it  will  be  useful  for  finding  solutions  to  the  system  of  equations  that  it  is  associated 
with.  (Think  about  your  objects,  and  review  Proof  Technique  L.)  However,  notice 
that  an  augmented  matrix  always  belongs  to  some  system  of  equations,  and  vice 
versa,  so  it  is  tempting  to  try  and  blur  the  distinction  between  the  two.  Here  is  a 
quick  example. 


Example  AMAA  Augmented  matrix  for  Archetype  A 
Archetype  A is  the  following  system  of  3 equations  in  3 variables. 

Xi  — X2  + 2x3  = 1 
2xi  + x2  + x3  = 8 
xi  + x2  = 5 
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Here  is  its  augmented  matrix. 

'1  -1  2 r 
2 118 
110  5 


A 


Subsection  RO 
Row  Operations 

An  augmented  matrix  for  a system  of  equations  will  save  us  the  tedium  of  continually 
writing  down  the  names  of  the  variables  as  we  solve  the  system.  It  will  also  release 
us  from  any  dependence  on  the  actual  names  of  the  variables.  We  have  seen  how 
certain  operations  we  can  perform  on  equations  (Definition  EO)  will  preserve  their 
solutions  (Theorem  EOPSS).  The  next  two  definitions  and  the  following  theorem 
carry  over  these  ideas  to  augmented  matrices. 

Definition  RO  Row  Operations 

The  following  three  operations  will  transform  an  to  x n matrix  into  a different  matrix 
of  the  same  size,  and  each  is  known  as  a row  operation. 

1.  Swap  the  locations  of  two  rows. 

2.  Multiply  each  entry  of  a single  row  by  a nonzero  quantity. 

3.  Multiply  each  entry  of  one  row  by  some  quantity,  and  add  these  values  to  the 
entries  in  the  same  columns  of  a second  row.  Leave  the  first  row  the  same  after 
this  operation,  but  replace  the  second  row  by  the  new  values. 


We  will  use  a symbolic  shorthand  to  describe  these  row  operations: 

1.  Ri  O Rj\  Swap  the  location  of  rows  i and  j. 

2.  aRi'.  Multiply  row  i by  the  nonzero  scalar  a. 

3.  aRi  + Rj\  Multiply  row  i by  the  scalar  a and  add  to  row  j. 

□ 


Definition  REM  Row-Equivalent  Matrices 

Two  matrices,  A and  B , are  row-equivalent  if  one  can  be  obtained  from  the  other 
by  a sequence  of  row  operations.  □ 

Example  TREM  Two  row-equivalent  matrices 
The  matrices 


'2 

-1 

3 

4‘ 

T 

1 

0 

6 ' 

A = 

5 

2 

-2 

3 

B = 

3 

0 

-2 

-9 

.1 

1 

0 

6 

2 

-1 

3 

4 
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are  row-equivalent  as  can  be  seen  from 


[2 

-1 

3 

4] 

ri 

1 

0 

8] 

ri 

1 

0 

6 1 

5 

o 

9 

3 

ti  i ^ 

5 

9 

9 

9 

— ‘2R  i +.ft2  v 

3 

0 

9 

—9 

.1 

i 

0 

ej 

2 

3 

4 

2 

-1 

3 

4 

We  can  also  say  that  any  pair  of  these  three  matrices  are  row-equivalent.  A 

Notice  that  each  of  the  three  row  operations  is  reversible  (Exercise  RREF.T10), 
so  we  do  not  have  to  be  careful  about  the  distinction  between  “A  is  row-equivalent 
to  B”  and  “B  is  row-equivalent  to  A.”  (Exercise  RREF.T11) 

The  preceding  definitions  are  designed  to  make  the  following  theorem  possible. 
It  says  that  row-equivalent  matrices  represent  systems  of  linear  equations  that  have 
identical  solution  sets. 

Theorem  REMES  Row-Equivalent  Matrices  represent  Equivalent  Systems 
Suppose  that  A and  B are  row-equivalent  augmented  matrices.  Then  the  systems  of 
linear  equations  that  they  represent  are  equivalent  systems. 

Proof.  If  we  perform  a single  row  operation  on  an  augmented  matrix,  it  will  have  the 
same  effect  as  if  we  did  the  analogous  equation  operation  on  the  system  of  equations 
the  matrix  represents.  By  exactly  the  same  methods  as  we  used  in  the  proof  of 
Theorem  EOPSS  we  can  see  that  each  of  these  row  operations  will  preserve  the  set 
of  solutions  for  the  system  of  equations  the  matrix  represents.  ■ 

So  at  this  point,  our  strategy  is  to  begin  with  a system  of  equations,  represent 
the  system  by  an  augmented  matrix,  perform  row  operations  (which  will  preserve 
solutions  for  the  system)  to  get  a “simpler”  augmented  matrix,  convert  back  to  a 
“simpler”  system  of  equations  and  then  solve  that  system,  knowing  that  its  solutions 
are  those  of  the  original  system.  Here  is  a rehash  of  Example  US  as  an  exercise  in 
using  our  new  tools. 

Example  USR  Three  equations,  one  solution,  reprised 

We  solve  the  following  system  using  augmented  matrices  and  row  operations.  This 
is  the  same  system  of  equations  solved  in  Example  US  using  equation  operations. 

xi  + 2x2  + 2x3  = 4 
x\  + 3x2  + 3^3  = 5 
2xi  + 6x2  + 5x3  = 6 


Form  the  augmented  matrix, 


T 

A = 1 
2 


2 

3 

6 


2 

3 

5 


4" 

5 

6 
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and  apply  row  operations, 


- li?l 


-2R2+R3k 


So  the  matrix 


2 2 4' 
1 1 1 
6 5 6 


0 0 


B = 


1 

0 1 1 
0 0 1 


ri 

2 

2 

1 

2,Ri~\-R3 

0 

1 

1 

1 

Lo 

2 

1 

- 

2. 

4 1 

ri 

2 

2 

4“ 

1 

-ms, 

0 

1 

1 

1 

-4 

Lo 

0 

1 

4 

2 

2 41 

is  row  equivalent  to  A and  by  Theorem  REMES  the  system  of  equations  below  has 
the  same  solution  set  as  the  original  system  of  equations. 


x\  + 2x2  + 2x3  = 4 

X2  + X3  = 1 

x3  = 4 


Solving  this  “simpler”  system  is  straightforward  and  is  identical  to  the  process 
in  Example  US.  A 


Subsection  RREF 
Reduced  Row-Echelon  Form 

The  preceding  example  amply  illustrates  the  definitions  and  theorems  we  have  seen 
so  far.  But  it  still  leaves  two  questions  unanswered.  Exactly  what  is  this  “simpler” 
form  for  a matrix,  and  just  how  do  we  get  it?  Here  is  the  answer  to  the  first  question, 
a definition  of  reduced  row-echelon  form. 

Definition  RREF  Reduced  Row-Echelon  Form 

A matrix  is  in  reduced  row-echelon  form  if  it  meets  all  of  the  following  conditions: 

1.  If  there  is  a row  where  every  entry  is  zero,  then  this  row  lies  below  any  other 
row  that  contains  a nonzero  entry. 

2.  The  leftmost  nonzero  entry  of  a row  is  equal  to  1. 

3.  The  leftmost  nonzero  entry  of  a row  is  the  only  nonzero  entry  in  its  column. 

4.  Consider  any  two  different  leftmost  nonzero  entries,  one  located  in  row  i, 
column  j and  the  other  located  in  row  s,  column  t.  If  s > i,  then  t > j . 

A row  of  only  zero  entries  is  called  a zero  row  and  the  leftmost  nonzero  entry 
of  a nonzero  row  is  a leading  1.  A column  containing  a leading  1 will  be  called  a 
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pivot  column.  The  number  of  nonzero  rows  will  be  denoted  by  r,  which  is  also 
equal  to  the  number  of  leading  l’s  and  the  number  of  pivot  columns. 

The  set  of  column  indices  for  the  pivot  columns  will  be  denoted  by  D = 
{di,  c?2,  ^3,  ...,  dr}  where  d1  < d2  < d^  < • • • < dr,  while  the  columns  that 
are  not  pivot  columns  will  be  denoted  as  F = {/ 1,  /2,  /3,  . . . , fn-r}  where  /1  < 

fl  < H < ■ ' ' < fn-r- 

□ 


The  principal  feature  of  reduced  row-echelon  form  is  the  pattern  of  leading  l’s 
guaranteed  by  conditions  (2)  and  (4),  reminiscent  of  a flight  of  geese,  or  steps  in  a 
staircase,  or  water  cascading  down  a mountain  stream. 

There  are  a number  of  new  terms  and  notation  introduced  in  this  definition, 
which  should  make  you  suspect  that  this  is  an  important  definition.  Given  all  there 
is  to  digest  here,  we  will  mostly  save  the  use  of  D and  F until  Section  TSS.  However, 
one  important  point  to  make  here  is  that  all  of  these  terms  and  notation  apply  to  a 
matrix.  Sometimes  we  will  employ  these  terms  and  sets  for  an  augmented  matrix, 
and  other  times  it  might  be  a coefficient  matrix.  So  always  give  some  thought  to 
exactly  which  type  of  matrix  you  are  analyzing. 


Example  RREF  A matrix  in  reduced  row-echelon  form 
The  matrix  C is  in  reduced  row-echelon  form. 


C = 


1 

0 

0 

0 

0 


-3  0 
0 0 
0 0 
0 0 
0 0 


6 0 0 
0 1 0 
0 0 1 
0 0 0 
0 0 0 


-5 

3 

7 

0 

0 


9 ' 
-7 
3 
0 
0 


This  matrix  has  two  zero  rows  and  three  pivot  columns.  So  r = 3.  Columns  1,  5, 
and  6 are  the  three  pivot  columns,  so  D = {1,  5,  6}  and  then  F = {2,  3,  4,  7,  8}. A 


Example  NRREF  A matrix  not  in  reduced  row-echelon  form 

The  matrix  E is  not  in  reduced  row-echelon  form,  as  it  fails  each  of  the  four 

requirements  once. 


E = 


1 0 
0 0 
0 0 
0 1 
0 0 
0 0 


-3 

0 

0 

0 

0 

0 


0 6 0 7 
5 0 10 
0 0 0 0 
0 0 0 0 
0 0 0 1 
0 0 0 0 


-5 

3 

0 

-4 

7 

0 


9 ‘ 
-7 
0 
2 
3 
0 


Our  next  theorem  has  a “constructive”  proof.  Learn  about  the  meaning  of  this 
term  in  Proof  Technique  C. 


Theorem  REMEF  Row-Equivalent  Matrix  in  Echelon  Form 
Suppose  A is  a matrix.  Then  there  is  a matrix  B so  that 
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1.  A and  B are  row-equivalent. 

2.  B is  in  reduced  row-echelon  form. 

Proof.  Suppose  that  A has  m rows  and  n columns.  We  will  describe  a process  for 
converting  A into  B via  row  operations.  This  procedure  is  known  as  Gauss-Jordan 
elimination.  Tracing  through  this  procedure  will  be  easier  if  you  recognize  that  i 
refers  to  a row  that  is  being  converted,  j refers  to  a column  that  is  being  converted, 
and  r keeps  track  of  the  number  of  nonzero  rows.  Here  we  go. 

1.  Set  j = 0 and  r = 0. 

2.  Increase  j by  1.  If  j now  equals  n + 1,  then  stop. 

3.  Examine  the  entries  of  A in  column  j located  in  rows  r + 1 through  m.  If  all 
of  these  entries  are  zero,  then  go  to  Step  2. 

4.  Choose  a row  from  rows  r + 1 through  m with  a nonzero  entry  in  column  j. 
Let  i denote  the  index  for  this  row. 

5.  Increase  r by  1. 

6.  Use  the  first  row  operation  to  swap  rows  i and  r. 

7.  Use  the  second  row  operation  to  convert  the  entry  in  row  r and  column  j to  a 

1. 

8.  Use  the  third  row  operation  with  row  r to  convert  every  other  entry  of  column 
j to  zero. 

9.  Go  to  Step  2. 

The  result  of  this  procedure  is  that  the  matrix  A is  converted  to  a matrix  in 
reduced  row-echelon  form,  which  we  will  refer  to  as  B.  We  need  to  now  prove  this 
claim  by  showing  that  the  converted  matrix  has  the  requisite  properties  of  Definition 
RREF.  First,  the  matrix  is  only  converted  through  row  operations  (Steps  6,  7,  8),  so 
A and  B are  row-equivalent  (Definition  REM). 

It  is  a bit  more  work  to  be  certain  that  B is  in  reduced  row-echelon  form.  We 
claim  that  as  we  begin  Step  2,  the  first  j columns  of  the  matrix  are  in  reduced 
row-echelon  form  with  r nonzero  rows.  Certainly  this  is  true  at  the  start  when  j = 0, 
since  the  matrix  has  no  columns  and  so  vacuously  meets  the  conditions  of  Definition 
RREF  with  r = 0 nonzero  rows. 

In  Step  2 we  increase  j by  1 and  begin  to  work  with  the  next  column.  There 
are  two  possible  outcomes  for  Step  3.  Suppose  that  every  entry  of  column  j in  rows 
r + 1 through  m is  zero.  Then  with  no  changes  we  recognize  that  the  first  j columns 
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of  the  matrix  has  its  first  r rows  still  in  reduced-row  echelon  form,  with  the  final 
m — r rows  still  all  zero. 

Suppose  instead  that  the  entry  in  row  i of  column  j is  nonzero.  Notice  that  since 
r + 1 < i < to,  we  know  the  first  j — 1 entries  of  this  row  are  all  zero.  Now,  in  Step 
5 we  increase  7'  by  1,  and  then  embark  on  building  a new  nonzero  row.  In  Step  6 we 
swap  row  r and  row  i.  In  the  first  j columns,  the  first  r — 1 rows  remain  in  reduced 
row-echelon  form  after  the  swap.  In  Step  7 we  multiply  row  r by  a nonzero  scalar, 
creating  a 1 in  the  entry  in  column  j of  row  *,  and  not  changing  any  other  rows. 
This  new  leading  1 is  the  first  nonzero  entry  in  its  row,  and  is  located  to  the  right  of 
all  the  leading  l’s  in  the  preceding  i — 1 rows.  With  Step  8 we  insure  that  every 
entry  in  the  column  with  this  new  leading  1 is  now  zero,  as  required  for  reduced 
row-echelon  form.  Also,  rows  r + 1 through  m are  now  all  zeros  in  the  first  j columns, 
so  we  now  only  have  one  new  nonzero  row,  consistent  with  our  increase  of  r by  one. 
Furthermore,  since  the  first  j — 1 entries  of  row  r are  zero,  the  employment  of  the 
third  row  operation  does  not  destroy  any  of  the  necessary  features  of  rows  1 through 
r — 1 and  rows  r + 1 through  m,  in  columns  1 through  j — 1. 

So  at  this  stage,  the  first  j columns  of  the  matrix  are  in  reduced  row-echelon 
form.  When  Step  2 finally  increases  j to  n + 1,  then  the  procedure  is  completed  and 
the  full  n columns  of  the  matrix  are  in  reduced  row-echelon  form,  with  the  value  of 
r correctly  recording  the  number  of  nonzero  rows.  ■ 

The  procedure  given  in  the  proof  of  Theorem  REMEF  can  be  more  precisely 
described  using  a pseudo-code  version  of  a computer  program.  Single-letter  variables, 
like  m,  n,  i,  j , r have  the  same  meanings  as  above.  :=  is  assignment  of  the  value 
on  the  right  to  the  variable  on  the  left,  A[i,  j]  is  the  equivalent  of  the  matrix  entry 
[A]^,  while  ==  is  an  equality  test  and  <>  is  a “not  equals”  test. 

input  m,  n and  A 
r :=  0 

for  j :=  1 to  n 

i :=  r+1 

while  i <=  m and  A[i,j]  ==  0 
i :=  i+1 

if  i < m+1 
r :=  r+1 

swap  rows  i and  r of  A (row  op  1) 
scale  A[r,j]  to  a leading  1 (row  op  2) 
for  k : = 1 to  m,  k <>  r 

make  A[k,j]  zero  (row  op  3,  employing  row  r) 
output  r and  A 

Notice  that  as  a practical  matter  the  “and”  used  in  the  conditional  statement  of 
the  while  statement  should  be  of  the  “short-circuit”  variety  so  that  the  array  access 
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that  follows  is  not  out-of-bounds. 

So  now  we  can  put  it  all  together.  Begin  with  a system  of  linear  equations 
(Definition  SLE),  and  represent  the  system  by  its  augmented  matrix  (Definition  AM). 
Use  row  operations  (Definition  RO)  to  convert  this  matrix  into  reduced  row-echelon 
form  (Definition  RREF),  using  the  procedure  outlined  in  the  proof  of  Theorem 
REMEF.  Theorem  REMEF  also  tells  us  we  can  always  accomplish  this,  and  that  the 
result  is  row-equivalent  (Definition  REM)  to  the  original  augmented  matrix.  Since 
the  matrix  in  reduced-row  echelon  form  has  the  same  solution  set,  we  can  analyze 
the  row-reduced  version  instead  of  the  original  matrix,  viewing  it  as  the  augmented 
matrix  of  a different  system  of  equations.  The  beauty  of  augmented  matrices  in 
reduced  row-echelon  form  is  that  the  solution  sets  to  the  systems  they  represent  can 
be  easily  determined,  as  we  will  see  in  the  next  few  examples  and  in  the  next  section. 

We  will  see  through  the  course  that  almost  every  interesting  property  of  a matrix 
can  be  discerned  by  looking  at  a row-equivalent  matrix  in  reduced  row-echelon  form. 
For  this  reason  it  is  important  to  know  that  the  matrix  B is  guaranteed  to  exist  by 
Theorem  REMEF  is  also  unique. 

Two  proof  techniques  are  applicable  to  the  proof.  First,  head  out  and  read  two 
proof  techniques:  Proof  Technique  CD  and  Proof  Technique  U. 

Theorem  RREFU  Reduced  Row-Echelon  Form  is  Unique 

Suppose  that  A is  an  m x n matrix  and  that  B and  C are  m x n matrices  that  are 
row-equivalent  to  A and  in  reduced  row-echelon  form.  Then  B = C . 

Proof.  We  need  to  begin  with  no  assumptions  about  any  relationships  between  B 
and  C,  other  than  they  are  both  in  reduced  row-echelon  form,  and  they  are  both 
row- equivalent  to  A. 

If  B and  C are  both  row-equivalent  to  A,  then  they  are  row-equivalent  to  each 
other.  Repeated  row  operations  on  a matrix  combine  the  rows  with  each  other  using 
operations  that  are  linear,  and  are  identical  in  each  column.  A key  observation  for 
this  proof  is  that  each  individual  row  of  B is  linearly  related  to  the  rows  of  C.  This 
relationship  is  different  for  each  row  of  B , but  once  we  fix  a row,  the  relationship  is 
the  same  across  columns.  More  precisely,  there  are  scalars  S^,  1 < i,  k < m such 
that  for  any  1 < i < m,  1 < j < n, 

m 

[£]y=5>*  fflkj 

k= 1 

You  should  read  this  as  saying  that  an  entry  of  row  i of  B (in  column  j)  is  a 
linear  function  of  the  entries  of  all  the  rows  of  C that  are  also  in  column  j,  and  the 
scalars  (Sik)  depend  on  which  row  of  B we  are  considering  (the  i subscript  on  6,^), 
but  are  the  same  for  every  column  (no  dependence  on  j in  Sik).  This  idea  may  be 
complicated  now,  but  will  feel  more  familiar  once  we  discuss  “linear  combinations” 
(Definition  LCCV)  and  moreso  when  we  discuss  “row  spaces”  (Definition  RSM). 
For  now,  spend  some  time  carefully  working  Exercise  RREF.M40,  which  is  designed 
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to  illustrate  the  origins  of  this  expression.  This  completes  our  exploitation  of  the 
row- equivalence  of  B and  C. 

We  now  repeatedly  exploit  the  fact  that  B and  C are  in  reduced  row-echelon 
form.  Recall  that  a pivot  column  is  all  zeros,  except  a single  one.  More  carefully,  if 
R is  a matrix  in  reduced  row-echelon  form,  and  dp  is  the  index  of  a pivot  column, 
then  [R]kd(  = 1 precisely  when  k = £ and  is  otherwise  zero.  Notice  also  that  any 
entry  of  R that  is  both  below  the  entry  in  row  £ and  to  the  left  of  column  da  is  also 
zero  (with  below  and  left  understood  to  include  equality).  In  other  words,  look  at 
examples  of  matrices  in  reduced  row-echelon  form  and  choose  a leading  1 (with  a 
box  around  it).  The  rest  of  the  column  is  also  zeros,  and  the  lower  left  “quadrant” 
of  the  matrix  that  begins  here  is  totally  zeros. 

Assuming  no  relationship  about  the  form  of  B and  C , let  B have  r nonzero  rows 
and  denote  the  pivot  columns  as  D = {di,  d?,  d%,  • . . , dr}.  For  C let  r'  denote  the 
number  of  nonzero  rows  and  denote  the  pivot  columns  as 

D'  = { d\,  d! 2,  d! 3,  . . . , d'r'}  (Definition  RREF).  There  are  four  steps  in  the 
proof,  and  the  first  three  are  about  showing  that  B and  C have  the  same  number 
of  pivot  columns,  in  the  same  places.  In  other  words,  the  “primed”  symbols  are  a 
necessary  fiction. 

First  Step.  Suppose  that  d\  < d[.  Then 

1 = [B]ldi  Definition  RREF 

m 

= ^ lk 

k= 1 
m 

= y^^ifc(O)  d1<d'1 

fc= 1 

= 0 

The  entries  of  C are  all  zero  since  they  are  left  and  below  of  the  leading  1 in  row  1 
and  column  d’x  of  C.  This  is  a contradiction,  so  we  know  that  d\  > d\.  By  an  entirely 
similar  argument,  reversing  the  roles  of  B and  C,  we  could  conclude  that  g?i  < d 
Together  this  means  that  d\  = d 

Second  Step.  Suppose  that  we  have  determined  that  g?i  = d[,  = d'2l  d%  = d'3, 

. . . dp  = d'p.  Let  us  now  show  that  cip+i  = d'p+1.  Working  towards  a contradiction, 
suppose  that  dp+ 1 < d'p+1.  For  1 < i < p, 

0 = [-®]p+i,df 
m 

= \.C]kd£ 

k= 1 
m 

= ^P+l.fc  P]  fed' 

fc=l 


Definition  RREF 
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— \P\ld',  + ^P+l>*  \C\kd', 


k= 1 


w+E  Vtbfc  (0) 

— ^p+l,£ 


fe=l 


Now, 


m 

= JZ^p+l,/c  [C1fcdp+1 


fc= 1 
P 


~ ^p+i,/c  [^]fcdr-i-i  + ^p+i,/c  [c]fcdP+i 


fc=i 

p 


fc=p+i 


— P]fcdp+1  + X]  ^P+!»fc  [^]fcdp+1 

k= 1 /c— p+1 

m 

= Vtbfc  [C]fcdp+i 

fc=p+i 

m 

= N!  ^p+i,fc(o) 

fc=p+i 
= 0 


Property  CACN 


Definition  RREF 


Definition  RREF 


Property  AACN 


^p+1  < ^p+l 


This  contradiction  shows  that  dp+ 1 > dp+1.  By  an  entirely  similar  argument,  we 
could  conclude  that  dp+ 1 < dp+1,  and  therefore  dp+ 1 = d')+1. 

Third  Step.  Now  we  establish  that  r = r'.  Suppose  that  r'  < r.  By  the  arguments 
above,  we  know  that  g?i  = d'±,  c?2  = d'2,  d%  = d'3,  . . . , dri  = d'r>.  For  1 < t < r'  < r, 

0 = [B]rdt  Definition  RREF 

m 

= ^ 5rk  [< C]kdl 

k= 1 

r'  m 

= J26rk^kde+  5rk[C]kdt  Property  AACN 

k—1  k=r'-\- 1 

r'  m 

= ^ Srk  [C]kdf,  + ^ ^rfc(O) 

k—1  k=r'-\-l 


Property  AACN 
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— E Srk  [C]kdt 

k=  1 
r' 

— Kk  [C]kd'e 

k= 1 

r' 

= 5rl  [C]U:  + E 8 rk  K'lfcd' 


k=  1 


— ^(1)  + Srk(0) 


k= 1 
k^e 


— &r£ 

Now  examine  the  entries  of  row  r of  B, 


— ^ rk 

fc=l 

r m 

= ^rfc  t^W  ^rfe  [^]fcj 

k—1  k=r'-\- 1 

r'  m 

— ^rfc  [^]fc-,-  + ^rfc(O) 

k=l 
r' 

= E $rk  [C]fcj 


fc=r'+l 


fc=l 

= E(°)  Pi 

k=i 
= 0 


kj 


Property  CACN 


Definition  RREF 


Property  CACN 
Definition  RREF 


So  row  r is  a totally  zero  row,  contradicting  that  this  should  be  the  bottommost 
nonzero  row  of  B.  So  r'  > r.  By  an  entirely  similar  argument,  reversing  the  roles  of 
B and  C , we  would  conclude  that  r'  < r and  therefore  r = r' . Thus,  combining  the 
first  three  steps  we  can  say  that  D = D' . In  other  words,  B and  C have  the  same 
pivot  columns,  in  the  same  locations. 

Fourth  Step.  In  this  final  step,  we  will  not  argue  by  contradiction.  Our  intent 
is  to  determine  the  values  of  the  Sl? . Notice  that  we  can  use  the  values  of  the  di 
interchangeably  for  B and  C . Here  we  go, 

1 = [B] 


Definition  RREF 
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— ^2  Sik  [C\kdi 
fe= 1 

m 

= Su  [C]idi+Y.5*  l°\ 


k= 1 
k^i 


— 3 a (1)  + ^2 
= 5u 


k= 1 
k^i 


and  for  £ ^ i 


0 = [B]idt 

m 

= ^2  $ ik  ^\kdt 


k= 1 


= 5u  { C}Ue+J2S * 


kd^ 


k= 1 
k^i 


E^(°) 


fe=i 


Property  CACN 


Definition  RREF 


Definition  RREF 


Property  CACN 


Definition  RREF 


Finally,  having  determined  the  values  of  the  Sij,  we  can  show  that  B = C.  For 
1 < * < m,  1 < j < n, 

m 

k=l 

m 

= Su  [C]ij  + ^2  Slk  1^] kj  Property  CACN 

k=l 

k^i 

m 

= (i)[c]«+E(°)  w* 

k= 1 
k^i 

= My 

So  B and  C have  equal  values  in  every  entry,  and  so  are  the  same  matrix.  ■ 


We  will  now  run  through  some  examples  of  using  these  definitions  and  theorems 
to  solve  some  systems  of  equations.  From  now  on,  when  we  have  a matrix  in  reduced 
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row-echelon  form,  we  will  mark  the  leading  l’s  with  a small  box.  This  will  help  you 
count,  and  identify,  the  pivot  columns.  In  your  work,  you  can  box  ’em,  circle  ’em  or 
write  ’em  in  a different  color  - just  identify  ’em  somehow.  This  device  will  prove 
very  useful  later  and  is  a very  good  habit  to  start  developing  idght  now. 

Example  SAB  Solutions  for  Archetype  B 

Let  us  find  the  solutions  to  the  following  system  of  equations, 

— 7xi  — 6x2  — 12cc3  = —33 
5x  1 + 5^2  + 7x3  = 24 
xi  + 4x3  = 5 

First,  form  the  augmented  matrix, 

'-7  -6  -12  -33- 
5 5 7 24 

10  4 5 


This  is  now  the  augmented  matrix  of  a very  simple  system  of  equations,  namely 
X\  = —3,  X2  = 5,  X3  = 2,  which  has  an  obvious  solution.  Furthermore,  we  can 
see  that  this  is  the  only  solution  to  this  system,  so  we  have  determined  the  entire 
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solution  set, 


You  might  compare  this  example  with  the  procedure  we  used  in  Example  US. A 

Archetypes  A and  B are  meant  to  contrast  each  other  in  many  respects.  So  let 
us  solve  Archetype  A now. 


Example  SAA  Solutions  for  Archetype  A 

Let  us  find  the  solutions  to  the  following  system  of  equations, 


X\  - X2  A 2cc3  = 1 
2,X\  A X2  A £3  = 8 

xi  A x-2  = 5 

First,  form  the  augmented  matrix, 

"i  -1  2 r 
2 118 
110  5 


and  work  to  reduced  row-echelon  form,  first  with  j = 1, 


— 2-Ri  -\-R.2 


"1  - 1 
0 3 

2 r 

-3  6 

[0  -1 
0 3 

2 1 
-3  6 

|_1  1 

0 5J 

0 2 

-2  4 

Now,  with  j = 2, 


[0 

-1 

2 

1 

0 

1 

-1 

2 

0 

2 

-2 

4 

0 

0 

1 

3' 

0 

0 

-1 

2 

0 

0 

0 

0 

1R,2+Rl  ^ 


0 

1 

2 


1 3 
-1  2 
-2  4 


The  system  of  equations  represented  by  this  augmented  matrix  needs  to  be 
considered  a bit  differently  than  that  for  Archetype  B.  First,  the  last  row  of  the 
matrix  is  the  equation  0 = 0,  which  is  always  true,  so  it  imposes  no  restrictions  on 
our  possible  solutions  and  therefore  we  can  safely  ignore  it  as  we  analyze  the  other 
two  equations.  These  equations  are, 


X\  A x3  = 3 
x2  - x3  = 2. 

While  this  system  is  fairly  easy  to  solve,  it  also  appears  to  have  a multitude  of 
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solutions.  For  example,  choose  X3  = 1 and  see  that  then  X\  = 2 and  X2  = 3 will 
together  form  a solution.  Or  choose  £3  = 0,  and  then  discover  that  X\  = 3 and 
X2  = 2 lead  to  a solution.  Try  it  yourself:  pick  any  value  of  X3  you  please,  and  figure 
out  what  X\  and  X2  should  be  to  make  the  first  and  second  equations  (respectively) 
true.  We’ll  wait  while  you  do  that.  Because  of  this  behavior,  we  say  that  X3  is  a “free” 
or  “independent”  variable.  But  why  do  we  vary  X3  and  not  some  other  variable?  For 
now,  notice  that  the  third  column  of  the  augmented  matrix  is  not  a pivot  column. 
With  this  idea,  we  can  rearrange  the  two  equations,  solving  each  for  the  variable 
whose  index  is  the  same  as  the  column  index  of  a pivot  column. 

X\  = 3 — X3 
x2  =2  + x3 

To  write  the  set  of  solution  vectors  in  set  notation,  we  have 


s-{ 

-3  - X3- 

2 + X3 

X3  G C , 

{ 

. %3  . 

J 

We  will  learn  more  in  the  next  section  about  systems  with  infinitely  many 
solutions  and  how  to  express  their  solution  sets.  Right  now,  you  might  look  back  at 
Example  IS.  A 

Example  SAE  Solutions  for  Archetype  E 

Let  us  find  the  solutions  to  the  following  system  of  equations, 

2x\  + X2  + 7x3  ~ 7x4  = 2 
—3a:i  + 4x’2  — 5x3  — 6x4  = 3 
X\  + X’2  + 4x3  — 5X4  = 2 
First,  form  the  augmented  matrix, 

' 2 1 7 -7  2" 

-3  4-5-6  3 
.1  1 4 -52 

and  work  to  reduced  row-echelon  form,  first  with  j = 1, 


r 1 

1 4 

-5  2] 

ri 

1 

4 

-5 

2] 

H.  l bt  3 ^ 

4 5 

-6  3 

3i?i  +i?2 

0 

7 

7 

-21 

0 

2 

1 7 

-7  2 

2 

1 

7 

-7 

2. 

0 

1 4 

-5 

2 

— A 

0 

7 7 

-21 

9 

0 

-1  -1 

3 

-2 
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Now,  with  j = 2, 


R2^R3 


01  4-52 

— 1R2 

B 

1 

4 

-5 

2 

0-1-1  3 -2 

0 

1 

1 

-3 

2 

0 7 7 -21  9 

0 

7 

7 

-21 

9 

And  finally,  with  j = 4, 

'0  o 


- 5 *3 


0 0 0 


1 1 
co  to 

0 

2 

— 7i?2  + #3v 

■0 

0 

0 

0 

3 -2 
1 -3 

i 

O CM 

-21 

9J 

0 

0 

0 0 

— 5j 

-2 

o' 

[0 

0 

3 -2 

0 1 

-3 

2 

— 2R3+-R2. 

0 

0 

1 -3 

0 

0 

1 

. 0 

0 

0 0 

0l 

Let  us  analyze  the  equations  in  the  system  represented  by  this  augmented  matrix. 
The  third  equation  will  read  0 = 1.  This  is  patently  false,  all  the  time.  No  choice 
of  values  for  our  variables  will  ever  make  it  true.  We  are  done.  Since  we  cannot 
even  make  the  last  equation  true,  we  have  no  hope  of  making  all  of  the  equations 
simultaneously  true.  So  this  system  has  no  solutions,  and  its  solution  set  is  the  empty 
set,  0 = { } (Definition  ES). 

Notice  that  we  could  have  reached  this  conclusion  sooner.  After  performing  the 
row  operation  — 7i?2  + -R3,  we  can  see  that  the  third  equation  reads  0 = — 5,  a false 
statement.  Since  the  system  represented  by  this  matrix  has  no  solutions,  none  of  the 
systems  represented  has  any  solutions.  However,  for  this  example,  we  have  chosen  to 
bring  the  matrix  all  the  way  to  reduced  row-echelon  form  as  practice.  A 


These  three  examples  (Example  SAB,  Example  SAA,  Example  SAE)  illustrate 
the  full  range  of  possibilities  for  a system  of  linear  equations  — no  solutions,  one 
solution,  or  infinitely  many  solutions.  In  the  next  section  we  will  examine  these  three 
scenarios  more  closely. 

We  (and  everybody  else)  will  often  speak  of  “row-reducing”  a matrix.  This  is  an 
informal  way  of  saying  we  begin  with  a matrix  A and  then  analyze  the  matrix  B that 
is  row-equivalent  to  A and  in  reduced  row-echelon  form.  So  the  term  row-reduce  is 
used  as  a verb,  but  describes  something  a bit  more  complicated,  since  we  do  not  really 
change  A.  Theorem  REMEF  tells  us  that  this  process  will  always  be  successful  and 
Theorem  RREFU  tells  us  that  B will  be  unambiguous.  Typically,  an  investigation 
of  A will  proceed  by  analyzing  B and  applying  theorems  whose  hypotheses  include 
the  row-equivalence  of  A and  B , and  usually  the  hypothesis  that  B is  in  reduced 
row- echelon  form. 


Reading  Questions 
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1.  Is  the  matrix  below  in  reduced  row-echelon  form?  Why  or  why  not? 

'1  5 0 6 8' 

0 0 12  0 
0 0 0 0 1 

2.  Use  row  operations  to  convert  the  matrix  below  to  reduced  row-echelon  form  and  report 
the  final  matrix. 

'2  1 8 ' 

-1  1 -1 
-2  5 4 

3.  Find  all  the  solutions  to  the  system  below  by  using  an  augmented  matrix  and  row 
operations.  Report  your  final  matrix  in  reduced  row-echelon  form  and  the  set  of  solutions. 

2xi  + 3X2  —*3  = 0 
Xl  + 2*2  +*3=3 
*i  + 3*2  + 3*3  = 7 


Exercises 

C05  Each  archetype  below  is  a system  of  equations.  Form  the  augmented  matrix  of  the 
system  of  equations,  convert  the  matrix  to  reduced  row-echelon  form  by  using  equation 
operations  and  then  describe  the  solution  set  of  the  original  system  of  equations. 

Archetype  A,  Archetype  B,  Archetype  C,  Archetype  D,  Archetype  E,  Archetype  F,  Archetype 
G,  Archetype  H,  Archetype  I,  Archetype  J 

For  problems  C10-C19,  find  all  solutions  to  the  system  of  linear  equations.  Use  your  favorite 
computing  device  to  row-reduce  the  augmented  matrices  for  the  systems,  and  write  the 
solutions  as  a set,  using  correct  set  notation. 

CIO1 

2*i  — 3*2  + *3  + 7*4  = 14 
2*i  + 8*2  — 4*3  + 5*4  = —1 
*i  + 3*2  — 3*3  = 4 
—5*i  + 2*2  + 3*3  + 4*4  = —19 

ciF 

3*i  + 4*2  — *3  + 2*4  = 6 
*1  — 2*2  + 3*3  +*4=2 
10*2  — 10*3  — *4  = 1 


2*1  + 4*2  + 5*3  + 7*4  = —26 


C121 
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013^ 


C14f 


015+ 


cie1 


C17f 


018+ 


C19^ 


Xl  + 2X2  + X3  — X4  = — 4 
— 2xi  — 4X2  + X3  + 11X4  = —10 


xi  + 2x2  + 8x3  — 7x4  = —2 
3xi  + 2x2  + 12x3  — 5x4  = 6 
— Xl  + X2  + X3  — 5x4  = — 10 


2xi  + X2  + 7x3  — 2X4  = 4 
3xi  — 2x2  + IIX4  = 13 
Xl  + X2  + 5x3  — 3X4  = 1 


2xi  + 3x2  — X3  — 9x4  = — 16 
Xl  + 2X2  + X3  = 0 
— Xl  + 2x2  + 3X3  + 4X4  = 8 


2xi  + 3x2  + 19x3  — 4x4  = 2 
Xl  + 2x2  + 12X3  — 3X4  = 1 
— Xl  + 2x2  + 8x3  — 5x4  = 1 


— xi  + 5x2  = —8 
— 2xi  + 5x2  + 5x3  + 2x4  = 9 
— 3xi  — X2  + 3x3  + X4  = 3 
7xi  + 6x2  + 5x3  + X4  = 30 


Xl  + 2x2  — 4X3  — X4  = 32 
xi  + 3x2  — 7x3  — X5  = 33 
xi  + 2x3  — 2x4  + 3x5  = 22 


2xi  + X2  = 6 

— Xl  — X2  = — 2 

3xi  + 4x2  = 4 
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3a:i  + 5x2  = 2 


For  problems  C30-C33,  row-reduce  the  matrix  without  the  aid  of  a calculator,  indicating 
the  row  operations  you  are  using  at  each  step  using  the  notation  of  Definition  RO. 

cso1 

'2  1 5 10' 

1 -3  -1  -2 

4-2  6 12 

C3F 

' 1 2-4' 

-3  -1  -3 

-2  1 -7 

032+ 

' 1 1 1 ' 

-4  -3  -2 

3 2 1 

033^ 

' 1 2-1  -l" 

2 4-14 

-1-2  3 5 


M40f  Consider  the  two  3x4  matrices  below 


' 1 3-2 

2 ' 

" 1 

2 

1 

2 

B = 

-1  -2  -1 

-1 

C = 

1 

1 

4 

0 

-1  -5  8 

-3 

-1 

-1 

-4 

1 

1.  Row-reduce  each  matrix  and  determine  that  the  reduced  row-echelon  forms  of  B and 
C are  identical.  From  this  argue  that  B and  C are  row-equivalent. 

2.  In  the  proof  of  Theorem  RREFU,  we  begin  by  arguing  that  entries  of  row-equivalent 
matrices  are  related  by  way  of  certain  scalars  and  sums.  In  this  example,  we  would 
write  that  entries  of  B from  row  i that  are  in  column  j are  linearly  related  to  the 
entries  of  C in  column  j from  all  three  rows 

\B\ij  = fci  [C\u  + 6*  [C]2  ■ + fe  [C]3j  1 < J < 4 

For  each  1 < i < 3 find  the  corresponding  three  scalars  in  this  relationship.  So  your 
answer  will  be  nine  scalars,  determined  three  at  a time. 
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M45^  You  keep  a number  of  lizards,  mice  and  peacocks  as  pets.  There  are  a total  of  108 
legs  and  30  tails  in  your  menagerie.  You  have  twice  as  many  mice  as  lizards.  How  many  of 
each  creature  do  you  have? 

M50f  A parking  lot  has  66  vehicles  (cars,  trucks,  motorcycles  and  bicycles)  in  it.  There 
are  four  times  as  many  cars  as  trucks.  The  total  number  of  tires  (4  per  car  or  truck,  2 per 
motorcycle  or  bicycle)  is  252.  How  many  cars  are  there?  How  many  bicycles? 

TICL  Prove  that  each  of  the  three  row  operations  (Definition  RO)  is  reversible.  More 
precisely,  if  the  matrix  B is  obtained  from  A by  application  of  a single  row  operation,  show 
that  there  is  a single  row  operation  that  will  transform  B back  into  A. 

Til  Suppose  that  A,  B and  C are  m x n matrices.  Use  the  definition  of  row-equivalence 
(Definition  REM)  to  prove  the  following  three  facts. 

1.  A is  row-equivalent  to  A. 

2.  If  A is  row-equivalent  to  B,  then  B is  row-equivalent  to  A. 

3.  If  A is  row-equivalent  to  B,  and  B is  row-equivalent  to  C,  then  A is  row-equivalent 
to  C. 

A relationship  that  satisfies  these  three  properties  is  known  as  an  equivalence  relation, 
an  important  idea  in  the  study  of  various  algebras.  This  is  a formal  way  of  saying  that 
a relationship  behaves  like  equality,  without  requiring  the  relationship  to  be  as  strict  as 
equality  itself.  We  will  see  it  again  in  Theorem  SER. 

T12  Suppose  that  B is  an  m x n matrix  in  reduced  row-echelon  form.  Build  a new,  likely 
smaller,  k x i matrix  C as  follows.  Keep  any  collection  of  k adjacent  rows,  k < m.  From 
these  rows,  keep  columns  1 through  £,  £ < n.  Prove  that  C is  in  reduced  row-echelon  form. 

T13  Generalize  Exercise  RREF.T12  by  just  keeping  any  k rows,  and  not  requiring  the 
rows  to  be  adjacent.  Prove  that  any  such  matrix  C is  in  reduced  row-echelon  form. 


Section  TSS 

Types  of  Solution  Sets 

We  will  now  be  more  careful  about  analyzing  the  reduced  row-echelon  form  derived 
from  the  augmented  matrix  of  a system  of  linear  equations.  In  particular,  we  will  see 
how  to  systematically  handle  the  situation  when  we  have  infinitely  many  solutions 
to  a system,  and  we  will  prove  that  every  system  of  linear  equations  has  either  zero, 
one  or  infinitely  many  solutions.  With  these  tools,  we  will  be  able  to  routinely  solve 
any  linear  system. 

Subsection  CS 
Consistent  Systems 

The  computer  scientist  Donald  Knuth  said,  “Science  is  what  we  understand  well 
enough  to  explain  to  a computer.  Art  is  everything  else.”  In  this  section  we  will 
remove  solving  systems  of  equations  from  the  realm  of  art,  and  into  the  realm  of 
science.  We  begin  with  a definition. 

Definition  CS  Consistent  System 

A system  of  linear  equations  is  consistent  if  it  has  at  least  one  solution.  Otherwise, 
the  system  is  called  inconsistent.  □ 

We  will  want  to  first  recognize  when  a system  is  inconsistent  or  consistent,  and  in 
the  case  of  consistent  systems  we  will  be  able  to  further  refine  the  types  of  solutions 
possible.  We  will  do  this  by  analyzing  the  reduced  row-echelon  form  of  a matrix, 
using  the  value  of  r,  and  the  sets  of  column  indices,  D and  F,  first  defined  back  in 
Definition  RREF. 

Use  of  the  notation  for  the  elements  of  D and  F can  be  a bit  confusing,  since 
we  have  subscripted  variables  that  are  in  turn  equal  to  integers  used  to  index  the 
matrix.  However,  many  questions  about  matrices  and  systems  of  equations  can  be 
answered  once  we  know  r,  D and  F.  The  choice  of  the  letters  D and  F refer  to  our 
upcoming  definition  of  dependent  and  free  variables  (Definition  IDV).  An  example 
will  help  us  begin  to  get  comfortable  with  this  aspect  of  reduced  row-echelon  form. 


Example  RREFN  Reducec 

row-echelon  form  notation 

For  the  5x9  matrix 

E 

5 

0 

0 

2 

8 

0 

5 -T 

0 

0 

0 

0 

4 

7 

0 

2 0 

B = 

0 

0 

0 

0 

3 
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0 

3 -6 
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0 
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0 

4 2 

. 0 

0 

0 

0 

0 

0 

0 

0 0 . 
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in  reduced  row-echelon  form  we  have 
r = 4 

d±  — 1 = 3 d3  — 4 d4  = 7 

fi  = 2 /2  = 5 /3  = 6 /4  = 8 h = 9 

Notice  that  the  sets 

I?  = {dr,  d2)  da,  d4}  = {1,  3,  4,  7}  F = {/l7  /2,  /3,  /4,  /5}  = {2,  5,  6,  8,  9} 

have  nothing  in  common  and  together  account  for  all  of  the  columns  of  B (we  say  it 
is  a partition  of  the  set  of  column  indices).  A 

The  number  r is  the  single  most  important  piece  of  information  we  can  get  from 
the  reduced  row-echelon  form  of  a matrix.  It  is  defined  as  the  number  of  nonzero 
rows,  but  since  each  nonzero  row  has  a leading  1,  it  is  also  the  number  of  leading 
l’s  present.  For  each  leading  1,  we  have  a pivot  column,  so  r is  also  the  number  of 
pivot  columns.  Repeating  ourselves,  r is  the  number  of  nonzero  rows,  the  number 
of  leading  l’s  and  the  number  of  pivot  columns.  Across  different  situations,  each 
of  these  interpretations  of  the  meaning  of  r will  be  useful,  though  it  may  be  most 
helpful  to  think  in  terms  of  pivot  columns. 

Before  proving  some  theorems  about  the  possibilities  for  solution  sets  to  systems 
of  equations,  let  us  analyze  one  particular  system  with  an  infinite  solution  set  very 
carefully  as  an  example.  We  will  use  this  technique  frequently,  and  shortly  we  will 
refine  it  slightly. 

Archetypes  I and  J are  both  fairly  large  for  doing  computations  by  hand  (though 
not  impossibly  large).  Their  properties  are  very  similar,  so  we  will  frequently  analyze 
the  situation  in  Archetype  I,  and  leave  you  the  joy  of  analyzing  Archetype  J yourself. 
So  work  through  Archetype  I with  the  text,  by  hand  and/or  with  a computer,  and 
then  tackle  Archetype  J yourself  (and  check  your  results  with  those  listed).  Notice 
too  that  the  archetypes  describing  systems  of  equations  each  lists  the  values  of  r,  D 
and  F.  Here  we  go. . . 

Example  ISSI  Describing  infinite  solution  sets,  Archetype  I 
Archetype  I is  the  system  of  to  = 4 equations  in  n = 7 variables. 

x\  + 4*2  — X4  + 7x6  ~ 9x7  = 3 
2xi  + 8x2  — *3  + 3*4  + 9*5  — 13:r6  + 7*7  = 9 
2*3  — 3*4  — 4*5  + 12*6  — 8*7  = 1 
— *1  — 4*2  + 2*3  + 4*4  -f  8*5  — 31*6  + 37*7  = 4 

This  system  has  a 4 x 8 augmented  matrix  that  is  row-equivalent  to  the  following 
matrix  (check  this!),  and  which  is  in  reduced  row-echelon  form  (the  existence  of 
this  matrix  is  guaranteed  by  Theorem  REMEF  and  its  uniqueness  is  guaranteed  by 


§TSS 


Beezer:  A First  Course  in  Linear  Algebra 


47 


Theorem  RREFU), 

0 4 0 0 2 1 -3  4' 

0 0^01-352 

00002-6  6 1 
.000  000  0 0. 

So  we  find  that  r = 3 and 

D = {di,  d2)  d3}  = {1,  3,  4}  F = {/1;  /2,  f3,  f4,  f5}  = {2,  5,  6,  7,  8} 

Let  i denote  any  one  of  the  r = 3 nonzero  rows.  Then  the  index  di  is  a pivot 
column.  It  will  be  easy  in  this  case  to  use  the  equation  represented  by  row  i to 
write  an  expression  for  the  variable  x ^ . It  will  be  a linear  function  of  the  variables 
xfn  xf2i  xf3 1 xh  (notice  that  /s  = 8 does  not  reference  a variable,  but  does  tell  us 
the  final  column  is  not  a pivot  column).  We  will  now  construct  these  three  expressions. 
Notice  that  using  subscripts  upon  subscripts  takes  some  getting  used  to. 

(i  = 1)  Xdx  = xi  = 4 — 4x2  — 2x5  — x§  + 3x7 

{i  = 2)  Xd2  = x3  = 2 — x5  + 3x6  - 5x7 

(i  = 3)  Xd3  = X4  = 1 — 2x5  + 6x6  — 6x7 

Each  element  of  the  set  F = {/1,  /2,  f3,  fi,  f3}  = {2,  5,  6,  7,  8}  is  the  index 
of  a variable,  except  for  /5  = 8.  We  refer  to  Xfx  = x2,  Xf2  = X5,  x/3  = X6  and 
Xf4  = x^  as  “free”  (or  “independent”)  variables  since  they  are  allowed  to  assume 
any  possible  combination  of  values  that  we  can  imagine  and  we  can  continue  on  to 
build  a solution  to  the  system  by  solving  individual  equations  for  the  values  of  the 
other  (“dependent”)  variables. 

Each  element  of  the  set  D = {g?i,  d2,  <^3}  = {1,  3,  4}  is  the  index  of  a variable. 
We  refer  to  the  variables  x^  = Xi , Xd2  = x3  and  Xd3  = X4  as  “dependent”  variables 
since  they  depend  on  the  independent  variables.  More  precisely,  for  each  possible 
choice  of  values  for  the  independent  variables  we  get  exactly  one  set  of  values  for 
the  dependent  variables  that  combine  to  form  a solution  of  the  system. 

To  express  the  solutions  as  a set,  we  write 


4 — 4x2  — 2x5  — X6  + 3x7 

x2 

2 — X5  + 3x6  ~ 5x7 
1 — 2x5  + 6x6  — 6x7 

X2,  x5,  x6,  x7  e C 

x5 

x6 

> 

The  condition  that  x2,  x.5,  Xq,  X7  £ C is  how  we  specify  that  the  variables 
x2,  X5,  X6,  X7  are  “free”  to  assume  any  possible  values. 

This  systematic  approach  to  solving  a system  of  equations  will  allow  us  to  create 
a precise  description  of  the  solution  set  for  any  consistent  system  once  we  have  found 
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the  reduced  row-echelon  form  of  the  augmented  matrix.  It  will  work  just  as  well 
when  the  set  of  free  variables  is  empty  and  we  get  just  a single  solution.  And  we 
could  program  a computer  to  do  it!  Now  have  a whack  at  Archetype  J (Exercise 
TSS.C10),  mimicking  the  discussion  in  this  example.  We’ll  still  be  here  when  you 
get  back.  A 

Using  the  reduced  row-echelon  form  of  the  augmented  matrix  of  a system  of 
equations  to  determine  the  nature  of  the  solution  set  of  the  system  is  a very  key 
idea.  So  let  us  look  at  one  more  example  like  the  last  one.  But  first  a definition,  and 
then  the  example.  We  mix  our  metaphors  a bit  when  we  call  variables  free  versus 
dependent.  Maybe  we  should  call  dependent  variables  “enslaved”? 

Definition  IDV  Independent  and  Dependent  Variables 

Suppose  A is  the  augmented  matrix  of  a consistent  system  of  linear  equations  and 
B is  a row-equivalent  matrix  in  reduced  row-echelon  form.  Suppose  j is  the  index 
of  a pivot  column  of  B.  Then  the  variable  Xj  is  dependent.  A variable  that  is  not 
dependent  is  called  independent  or  free.  □ 

If  you  studied  this  definition  carefully,  you  might  wonder  what  to  do  if  the  system 
has  n variables  and  column  n + 1 is  a pivot  column?  We  will  see  shortly,  by  Theorem 
RCLS,  that  this  never  happens  for  a consistent  system. 

Example  FDV  Free  and  dependent  variables 
Consider  the  system  of  five  equations  in  five  variables, 

x\  — X2  — 2^3  + X4  + llx’5  = 13 
Xi  — X2  A X3  + X4  + 5x^  = 16 
2xi  — 2x2  + X4  + IOX5  = 21 

2xi  — 2x2  — X3  + 3x4  + 20x5  = 38 

2xi  — 2x2  A *^3  A X4  8x5  = 22 
whose  augmented  matrix  row-reduces  to 

{!]  -1  0 0 3 6' 

0 0 0 0-21 

0 0 0 0 4 9 

0 0 0 0 0 0 

_ 0 0 0 0 0 0_ 

Columns  1,  3 and  4 are  pivot  columns,  so  D = {1,  3,  4}.  From  this  we  know 
that  the  variables  xi,  X3  and  X4  will  be  dependent  variables,  and  each  of  the  r = 3 
nonzero  rows  of  the  row-reduced  matrix  will  yield  an  expression  for  one  of  these 
three  variables.  The  set  F is  all  the  remaining  column  indices,  F = {2,  5,  6}.  The 
column  index  6 in  F means  that  the  final  column  is  not  a pivot  column,  and  thus 
the  system  is  consistent  (Theorem  RCLS).  The  remaining  indices  in  F indicate  free 
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variables,  so  x2  and  X5  (the  remaining  variables)  are  our  free  variables.  The  resulting 
three  equations  that  describe  our  solution  set  are  then, 

(xdk  = xi)  xi=6  + x2-3x5 

( Xd2  = x3)  x3  = 1 + 2x5 

(xd3  = x a)  X4  = 9 - 4x5 

Make  sure  you  understand  where  these  three  equations  came  from,  and  notice 
how  the  location  of  the  pivot  columns  determined  the  variables  on  the  left-hand  side 
of  each  equation.  We  can  compactly  describe  the  solution  set  as, 

+ x2  - 3x5 
x2 

1 + 2.X5 
9 — 4x5 
x5 

Notice  how  we  express  the  freedom  for  x2  and  X5:  x2,  X5  € C.  A 

Sets  are  an  important  part  of  algebra,  and  we  have  seen  a few  already.  Being 
comfortable  with  sets  is  important  for  understanding  and  writing  proofs.  If  you  have 
not  already,  pay  a visit  now  to  Section  SET. 

We  can  now  use  the  values  of  to,  n,  r,  and  the  independent  and  dependent 
variables  to  categorize  the  solution  sets  for  linear  systems  through  a sequence  of 
theorems. 

Through  the  following  sequence  of  proofs,  you  will  want  to  consult  three  proof 
techniques.  See  Proof  Technique  E,  Proof  Technique  N,  Proof  Technique  CP. 

First  we  have  an  important  theorem  that  explores  the  distinction  between 
consistent  and  inconsistent  linear  systems. 

Theorem  RCLS  Recognizing  Consistency  of  a Linear  System 
Suppose  A is  the  augmented  matrix  of  a system  of  linear  equations  with  n variables. 
Suppose  also  that  B is  a row- equivalent  matrix  in  reduced  row-echelon  form  with  r 
nonzero  rows.  Then  the  system  of  equations  is  inconsistent  if  and  only  if  column 
n + l of  B is  a pivot  column. 

Proof.  (<+)  The  first  half  of  the  proof  begins  with  the  assumption  that  column  n + l 
of  B is  a pivot  column.  Then  the  leading  1 of  row  r is  located  in  column  n + 1 of 
B and  so  row  r of  B begins  with  n consecutive  zeros,  finishing  with  the  leading  1. 
This  is  a representation  of  the  equation  0 = 1,  which  is  false.  Since  this  equation 
is  false  for  any  collection  of  values  we  might  choose  for  the  variables,  there  are  no 
solutions  for  the  system  of  equations,  and  the  system  is  inconsistent. 

(=>)  For  the  second  half  of  the  proof,  we  wish  to  show  that  if  we  assume  the  system 
is  inconsistent,  then  column  n+  1 of  B is  a pivot  column.  But  instead  of  proving  this 
directly,  we  will  form  the  logically  equivalent  statement  that  is  the  contrapositive, 
and  prove  that  instead  (see  Proof  Technique  CP).  Turning  the  implication  around, 
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and  negating  each  portion,  we  arrive  at  the  logically  equivalent  statement:  if  column 
n + 1 of  B is  not  a pivot  column,  then  the  system  of  equations  is  consistent. 

If  column  n + 1 of  B is  not  a pivot  column,  the  leading  1 for  row  r is  located 
somewhere  in  columns  1 through  n.  Then  every  preceding  row’s  leading  1 is  also 
located  in  columns  1 through  n.  In  other  words,  since  the  last  leading  1 is  not  in 
the  last  column,  no  leading  1 for  any  row  is  in  the  last  column,  due  to  the  echelon 
layout  of  the  leading  l’s  (Definition  RREF).  We  will  now  construct  a solution  to  the 
system  by  setting  each  dependent  variable  to  the  entry  of  the  final  column  in  the 
row  with  the  corresponding  leading  1,  and  setting  each  free  variable  to  zero.  That 
sentence  is  pretty  vague,  so  let  us  be  more  precise.  Using  our  notation  for  the  sets 
D and  F from  the  reduced  row-echelon  form  (Definition  RREF): 

Xdi  = [B\ijn+ 1 , 1 <i<r  Xfi  = 0,  1 <i<n-r 

These  values  for  the  variables  make  the  equations  represented  by  the  first  r rows 
of  B all  true  (convince  yourself  of  this).  Rows  numbered  greater  than  r (if  any)  are 
all  zero  rows,  hence  represent  the  equation  0 = 0 and  are  also  all  true.  We  have  now 
identified  one  solution  to  the  system  represented  by  B , and  hence  a solution  to  the 
system  represented  by  A (Theorem  REMES).  So  we  can  say  the  system  is  consistent 
(Definition  CS).  ■ 

The  beauty  of  this  theorem  being  an  equivalence  is  that  we  can  unequivocally 
test  to  see  if  a system  is  consistent  or  inconsistent  by  looking  at  just  a single  entry 
of  the  reduced  row-echelon  form  matrix.  We  could  program  a computer  to  do  it! 

Notice  that  for  a consistent  system  the  row-reduced  augmented  matrix  has 
n + 1 £ F,  so  the  largest  element  of  F does  not  refer  to  a variable.  Also,  for  an 
inconsistent  system,  n + 1 £ D,  and  it  then  does  not  make  much  sense  to  discuss 
whether  or  not  variables  are  free  or  dependent  since  there  is  no  solution.  Take  a look 
back  at  Definition  IDV  and  see  why  we  did  not  need  to  consider  the  possibility  of 
referencing  xn+i  as  a dependent  variable. 

With  the  characterization  of  Theorem  RCLS,  we  can  explore  the  relationships 
between  r and  n for  a consistent  system.  We  can  distinguish  between  the  case  of  a 
unique  solution  and  infinitely  many  solutions,  and  furthermore,  we  recognize  that 
these  are  the  only  two  possibilities. 

Theorem  CSRN  Consistent  Systems,  r and  n 

Suppose  A is  the  augmented  matrix  of  a consistent  system  of  linear  equations  with 
n variables.  Suppose  also  that  B is  a row- equivalent  matrix  in  reduced  row-echelon 
form  with  r pivot  columns.  Then  r < n.  If  r = n,  then  the  system  has  a unique 
solution,  and  if  r < n,  then  the  system  has  infinitely  many  solutions. 

Proof.  This  theorem  contains  three  implications  that  we  must  establish.  Notice  first 
that  B has  n+  1 columns,  so  there  can  be  at  most  n+1  pivot  columns,  i.e.  r < n + 1. 
If  r = n + 1,  then  every  column  of  B is  a pivot  column,  and  in  particular,  the  last 
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column  is  a pivot  column.  So  Theorem  RCLS  tells  us  that  the  system  is  inconsistent, 
contrary  to  our  hypothesis.  We  are  left  with  r < n. 

When  r = n,  we  find  n — r = 0 free  variables  (i.e.  F = {n  + 1})  and  the  only 
solution  is  given  by  setting  the  n variables  to  the  the  first  n entries  of  column  n + 1 
of  B. 

When  r < n,  we  have  n — r > 0 free  variables.  Choose  one  free  variable  and  set 
all  the  other  free  variables  to  zero.  Now,  set  the  chosen  free  variable  to  any  fixed 
value.  It  is  possible  to  then  determine  the  values  of  the  dependent  variables  to  create 
a solution  to  the  system.  By  setting  the  chosen  free  variable  to  different  values,  in 
this  manner  we  can  create  infinitely  many  solutions.  ■ 

Subsection  FV 
Free  Variables 

The  next  theorem  simply  states  a conclusion  from  the  final  paragraph  of  the  previous 
proof,  allowing  us  to  state  explicitly  the  number  of  free  variables  for  a consistent 
system. 

Theorem  FVCS  Free  Variables  for  Consistent  Systems 

Suppose  A is  the  augmented  matrix  of  a consistent  system  of  linear  equations  with 
n variables.  Suppose  also  that  B is  a row- equivalent  matrix  in  reduced  row-echelon 
form  with  r rows  that  are  not  completely  zeros.  Then  the  solution  set  can  be  described 
with  n — r free  variables. 

Proof.  See  the  proof  of  Theorem  CSRN.  ■ 

Example  CFV  Counting  free  variables 

For  each  archetype  that  is  a system  of  equations,  the  values  of  n and  r are  listed. 
Many  also  contain  a few  sample  solutions.  We  can  use  this  information  profitably, 
as  illustrated  by  four  examples. 

1.  Archetype  A has  n = 3 and  r = 2.  It  can  be  seen  to  be  consistent  by  the 
sample  solutions  given.  Its  solution  set  then  has  n — r = 1 free  variables,  and 
therefore  will  be  infinite. 

2.  Archetype  B has  n = 3 and  r = 3.  It  can  be  seen  to  be  consistent  by  the  single 
sample  solution  given.  Its  solution  set  can  then  be  described  with  n — r = 0 
free  variables,  and  therefore  will  have  just  the  single  solution. 

3.  Archetype  H has  n = 2 and  r = 3.  In  this  case,  column  3 must  be  a pivot 
column,  so  by  Theorem  RCLS,  the  system  is  inconsistent.  We  should  not  try 
to  apply  Theorem  FVCS  to  count  free  variables,  since  the  theorem  only  applies 
to  consistent  systems.  (What  would  happen  if  you  did  try  to  incorrectly  apply 
Theorem  FVCS?) 
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4.  Archetype  E has  n = 4 and  r = 3.  However,  by  looking  at  the  reduced  row- 
echelon  form  of  the  augmented  matrix,  we  find  that  column  5 is  a pivot  column. 
By  Theorem  RCLS  we  recognize  the  system  as  inconsistent. 


A 

We  have  accomplished  a lot  so  far,  but  our  main  goal  has  been  the  following 
theorem,  which  is  now  very  simple  to  prove.  The  proof  is  so  simple  that  we  ought  to 
call  it  a corollary,  but  the  result  is  important  enough  that  it  deserves  to  be  called  a 
theorem.  (See  Proof  Technique  LC.)  Notice  that  this  theorem  was  presaged  first  by 
Example  TTS  and  further  foreshadowed  by  other  examples. 

Theorem  PSSLS  Possible  Solution  Sets  for  Linear  Systems 

A system  of  linear  equations  has  no  solutions,  a unique  solution  or  infinitely  many 
solutions. 

Proof.  By  its  definition,  a system  is  either  inconsistent  or  consistent  (Definition  CS). 
The  first  case  describes  systems  with  no  solutions.  For  consistent  systems,  we  have 
the  remaining  two  possibilities  as  guaranteed  by,  and  described  in,  Theorem  CSRN. 


Here  is  a diagram  that  consolidates  several  of  our  theorems  from  this  section,  and 
which  is  of  practical  use  when  you  analyze  systems  of  equations.  Note  this  presumes 
we  have  the  reduced  row-echelon  form  of  the  augmented  matrix  of  the  system  to 
analyze. 


Theorem  RCLS 


no  pivot  column  in 
column  n + 1 


pivot  column  in 
column  n + 1 


Consistent 


Inconsistent 


Theorem  CSRN 


Infinite  Solutions  Unique  Solution 


Diagram  DTSLS:  Decision  Tree  for  Solving  Linear  Systems 
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We  have  one  more  theorem  to  round  out  our  set  of  tools  for  determining  solution 
sets  to  systems  of  linear  equations. 

Theorem  CMVEI  Consistent,  More  Variables  than  Equations,  Infinite  solutions 
Suppose  a consistent  system  of  linear  equations  has  m equations  in  n variables.  If 
n > m,  then  the  system  has  infinitely  many  solutions. 

Proof.  Suppose  that  the  augmented  matrix  of  the  system  of  equations  is  row- 
equivalent  to  B , a matrix  in  reduced  row-eclielon  form  with  r nonzero  rows.  Because 
B has  to  rows  in  total,  the  number  of  nonzero  rows  is  less  than  or  equal  to  to.  In 
other  words,  r < m.  Follow  this  with  the  hypothesis  that  n > to  and  we  find  that 
the  system  has  a solution  set  described  by  at  least  one  free  variable  because 

n — r>n  — m>  0. 

A consistent  system  with  free  variables  will  have  an  infinite  number  of  solutions, 
as  given  by  Theorem  CSRN.  ■ 

Notice  that  to  use  this  theorem  we  need  only  know  that  the  system  is  consistent, 
together  with  the  values  of  to  and  n.  We  do  not  necessarily  have  to  compute  a 
row-equivalent  reduced  row-echelon  form  matrix,  even  though  we  discussed  such  a 
matrix  in  the  proof.  This  is  the  substance  of  the  following  example. 

Example  OSGMD  One  solution  gives  many,  Archetype  D 
Archetype  D is  the  system  of  to  = 3 equations  in  n = 4 variables, 

2xi  + X2  + 7x3  ^ 7x4  = 8 
— 3a;  i + 4^2  — 5x3  — 6x4  = —12 
x\  + X2  + 4x3  — 5x4  = 4 

and  the  solution  Xi  = 0,  X2  = 1,  X3  = 2,  X4  = 1 can  be  checked  easily  by 
substitution.  Having  been  handed  this  solution,  we  know  the  system  is  consistent. 
This,  together  with  n > to,  allows  us  to  apply  Theorem  CMVEI  and  conclude  that 
the  system  has  infinitely  many  solutions.  A 

These  theorems  give  us  the  procedures  and  implications  that  allow  us  to  com- 
pletely solve  any  system  of  linear  equations.  The  main  computational  tool  is  using 
row  operations  to  convert  an  augmented  matrix  into  reduced  row-echelon  form.  Here 
is  a broad  outline  of  how  we  would  instruct  a computer  to  solve  a system  of  linear 
equations. 

1.  Represent  a system  of  linear  equations  in  n variables  by  an  augmented  matrix 
(an  array  is  the  appropriate  data  structure  in  most  computer  languages). 

2.  Convert  the  matrix  to  a row-equivalent  matrix  in  reduced  row-echelon  form 
using  the  procedure  from  the  proof  of  Theorem  REMEF.  Identify  the  location 
of  the  pivot  columns,  and  their  number  r. 
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3.  If  column  n + 1 is  a pivot  column,  output  the  statement  that  the  system  is 
inconsistent  and  halt. 

4.  If  column  n + 1 is  not  a pivot  column,  there  are  two  possibilities: 

(a)  r = n and  the  solution  is  unique.  It  can  be  read  off  directly  from  the 
entries  in  rows  1 through  n of  column  n + 1. 

(b)  r < n and  there  are  infinitely  many  solutions.  If  only  a single  solution 
is  needed,  set  all  the  free  variables  to  zero  and  read  off  the  dependent 
variable  values  from  column  n + 1,  as  in  the  second  half  of  the  proof  of 
Theorem  RCLS.  If  the  entire  solution  set  is  required,  figure  out  some  nice 
compact  way  to  describe  it,  since  your  finite  computer  is  not  big  enough 
to  hold  all  the  solutions  (we  will  have  such  a way  soon). 

The  above  makes  it  all  sound  a bit  simpler  than  it  really  is.  In  practice,  row 
operations  employ  division  (usually  to  get  a leading  entry  of  a row  to  convert  to 
a leading  1)  and  that  will  introduce  round-off  errors.  Entries  that  should  be  zero 
sometimes  end  up  being  very,  very  small  nonzero  entries,  or  small  entries  lead  to 
overflow  errors  when  used  as  divisors.  A variety  of  strategies  can  be  employed  to 
minimize  these  sorts  of  errors,  and  this  is  one  of  the  main  topics  in  the  important 
subject  known  as  numerical  linear  algebra. 

In  this  section  we  have  gained  a foolproof  procedure  for  solving  any  system  of 
linear  equations,  no  matter  how  many  equations  or  variables.  We  also  have  a handful 
of  theorems  that  allow  us  to  determine  partial  information  about  a solution  set 
without  actually  constructing  the  whole  set  itself.  Donald  Knuth  would  be  proud. 

Reading  Questions 


1.  How  can  we  easily  recognize  when  a system  of  linear  equations  is  inconsistent  or  not? 

2.  Suppose  we  have  converted  the  augmented  matrix  of  a system  of  equations  into  reduced 
row-echelon  form.  How  do  we  then  identify  the  dependent  and  independent  (free) 
variables? 

3.  What  are  the  possible  solution  sets  for  a system  of  linear  equations? 

Exercises 

CIO  In  the  spirit  of  Example  ISSI,  describe  the  infinite  solution  set  for  Archetype  J. 

For  Exercises  C21-C28,  find  the  solution  set  of  the  system  of  linear  equations.  Give  the 
values  of  n and  r,  and  interpret  your  answers  in  light  of  the  theorems  of  this  section. 
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021+ 


C22+ 


C23f 


C24f 


C25f 


C26t 


C27t 


*1  + 4x2  + 3X3  —*4  = 5 
*1  — *2  + *3  + 2*4  = 6 
4*1  + *2  + 6*3  + 5*4  = 9 


*1  — 2*2  + *3  — *4  = 3 
2*1  — 4*2  + *3  + *4  = 2 
*1  — 2*2  — 2*3  + 3*4  = 1 


*1  — 2*2  + *3  — *4  = 3 
*1  + *2  + *3  — *4  = 1 
*1  + *3  — *4  = 2 

*1  — 2*2  + *3  — *4  = 2 

*1  + *2  + *3  — *4  = 2 

*1  + *3  — *4  = 2 

*1  + 2*2  + 3*3  = 1 
2*1  — *2  + *3  = 2 
3*1  + *2  + *3  = 4 
*2  + 2*3  = 6 

*1  + 2*2  + 3*3  = 1 
2*1  — *2  + *3  = 2 
3*1  + *2  + *3  = 4 
5*2  + 2*3  = 1 

*1  + 2*2  + 3*3  = 0 
2*1  — *2  + *3  = 2 
*1  — 8*2  — 7*3  = 1 

*2  + *3  = 0 
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028+ 


*1  + 2X2  + 3*3  = 1 
2*1  — *2  + *3  = 2 
*1  — 8*2  — 7*3  = 1 

*2  + *3  = 0 


M45^  The  details  for  Archetype  J include  several  sample  solutions.  Verify  that  one  of 
these  solutions  is  correct  (any  one,  but  just  one).  Based  only  on  this  evidence,  and  especially 
without  doing  any  row  operations,  explain  how  you  know  this  system  of  linear  equations 
has  infinitely  many  solutions. 

M46  Consider  Archetype  J,  and  specifically  the  row-reduced  version  of  the  augmented 
matrix  of  the  system  of  equations,  denoted  as  B here,  and  the  values  of  r,  D and  F 
immediately  following.  Determine  the  values  of  the  entries 


[-®]l,dl  [-®]3,d3  [-®]l,d3  [-®]3,di  [-®]di,l  [-®]d3, 3 [-®ldi,3  [-®]d3,l 

(See  Exercise  TSS.M70  for  a generalization.) 


[B] 


LA 


[B] 


3, A 


For  Exercises  M51-M57  say  as  much  as  possible  about  each  system’s  solution  set.  Be 
sure  to  make  it  clear  which  theorems  you  are  using  to  reach  your  conclusions. 

M51'  A consistent  system  of  8 equations  in  6 variables. 

M52^  A consistent  system  of  6 equations  in  8 variables. 

M53^  A system  of  5 equations  in  9 variables. 

M54^  A system  with  12  equations  in  35  variables. 

M56^  A system  with  6 equations  in  12  variables. 

M57^  A system  with  8 equations  and  6 variables.  The  reduced  row-echelon  form  of 
the  augmented  matrix  of  the  system  has  7 pivot  columns. 

M60  Without  doing  any  computations,  and  without  examining  any  solutions,  say  as  much 
as  possible  about  the  form  of  the  solution  set  for  each  archetype  that  is  a system  of  equations. 

Archetype  A,  Archetype  B,  Archetype  C,  Archetype  D,  Archetype  E,  Archetype  F,  Archetype 
G,  Archetype  H,  Archetype  I,  Archetype  J 

M70  Suppose  that  B is  a matrix  in  reduced  row-echelon  form  that  is  equivalent  to  the 
augmented  matrix  of  a system  of  equations  with  m equations  in  n variables.  Let  r,  D and  F 
be  as  defined  in  Definition  RREF.  What  can  you  conclude,  in  general,  about  the  following 
entries? 

[S]Ml  [S]3,d  3 1^3,  dl  [B]dl,  r [B]d3i  3 [B]dl,  3 [B]d3A  [B)1J±  [B]s>/l 

If  you  cannot  conclude  anything  about  an  entry,  then  say  so.  (See  Exercise  TSS.M46.) 
TICL  An  inconsistent  system  may  have  r > n.  If  we  try  (incorrectly!)  to  apply  Theorem 
FVCS  to  such  a system,  how  many  free  variables  would  we  discover? 


§TSS 


Beezer:  A First  Course  in  Linear  Algebra 


57 


TlF  Suppose  A is  the  augmented  matrix  of  a system  of  linear  equations  in  n variables, 
and  that  B is  a row-equivalent  matrix  in  reduced  row-echelon  form  with  r pivot  columns. 
If  r = n + 1,  prove  that  the  system  of  equations  is  inconsistent. 

T20  Suppose  that  B is  a matrix  in  reduced  row-echelon  form  that  is  equivalent  to  the 
augmented  matrix  of  a system  of  equations  with  m equations  in  n variables.  Let  r,  D and 
F be  as  defined  in  Definition  RREF.  Prove  that  dk  > k for  all  1 < k < r.  Then  suppose 
that  r > 2 and  1 < k < i < r and  determine  what  can  you  conclude,  in  general,  about  the 
following  entries. 

[B\k,dk  [B\i,dk  [B]dk,k  [B]dk,e  [B\de,k  l B]dkje  \^\dtjk 

If  you  cannot  conclude  anything  about  an  entry,  then  say  so.  (See  Exercise  TSS.M46  and 
Exercise  TSS.M70.) 

T4(F  Suppose  that  the  coefficient  matrix  of  a consistent  system  of  linear  equations  has 
two  columns  that  are  identical.  Prove  that  the  system  has  infinitely  many  solutions. 

T411  Consider  the  system  of  linear  equations  CS  (A,  b),  and  suppose  that  every  element 
of  the  vector  of  constants  b is  a common  multiple  of  the  corresponding  element  of  a certain 
column  of  A.  More  precisely,  there  is  a complex  number  a,  and  a column  index  j,  such 
that  [b]4  = a [A]^.  for  all  i.  Prove  that  the  system  is  consistent. 


Section  HSE 

Homogeneous  Systems  of  Equations 


In  this  section  we  specialize  to  systems  of  linear  equations  where  every  equation 
has  a zero  as  its  constant  term.  Along  the  way,  we  will  begin  to  express  more  and 
more  ideas  in  the  language  of  matrices  and  begin  a move  away  from  writing  out 
whole  systems  of  equations.  The  ideas  initiated  in  this  section  will  carry  through 
the  remainder  of  the  course. 

Subsection  SHS 

Solutions  of  Homogeneous  Systems 

As  usual,  we  begin  with  a definition. 

Definition  HS  Homogeneous  System 

A system  of  linear  equations,  AS(A,  b)  is  homogeneous  if  the  vector  of  constants 
is  the  zero  vector,  in  other  words,  if  b = 0.  □ 

Example  AHSAC  Archetype  C as  a homogeneous  system 

For  each  archetype  that  is  a system  of  equations,  we  have  formulated  a similar,  yet 
different,  homogeneous  system  of  equations  by  replacing  each  equation’s  constant 
term  with  a zero.  To  wit,  for  Archetype  C,  we  can  convert  the  original  system  of 
equations  into  the  homogeneous  system, 

2xi  — 3X2  + *3  — 6x4  = 0 
4xi  + X2  + 2.T3  + 9X4  = 0 
3x\  + X2  + X3  + 8x4  = 0 

Can  you  quickly  find  a solution  to  this  system  without  row-reducing  the  aug- 
mented matrix?  A 

As  you  might  have  discovered  by  studying  Example  AHSAC,  setting  each  variable 
to  zero  will  always  be  a solution  of  a homogeneous  system.  This  is  the  substance  of 
the  following  theorem. 

Theorem  HSC  Homogeneous  Systems  are  Consistent 

Suppose  that  a system  of  linear  equations  is  homogeneous.  Then  the  system  is 
consistent  and  one  solution  is  found  by  setting  each  variable  to  zero. 

Proof.  Set  each  variable  of  the  system  to  zero.  When  substituting  these  values  into 
each  equation,  the  left-hand  side  evaluates  to  zero,  no  matter  what  the  coefficients 
are.  Since  a homogeneous  system  has  zero  on  the  right-hand  side  of  each  equation 
as  the  constant  term,  each  equation  is  true.  With  one  demonstrated  solution,  we 
can  call  the  system  consistent.  ■ 
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Since  this  solution  is  so  obvious,  we  now  define  it  as  the  trivial  solution. 

Definition  TSHSE  Trivial  Solution  to  Homogeneous  Systems  of  Equations 
Suppose  a homogeneous  system  of  linear  equations  has  n variables.  The  solution 
X\  = 0,  X2  = 0,  . . . , xn  = 0 (i.e.  x = 0)  is  called  the  trivial  solution.  □ 

Here  are  three  typical  examples,  which  we  will  reference  throughout  this  section. 
Work  through  the  row  operations  as  we  bring  each  to  reduced  row-echelon  form. 
Also  notice  what  is  similar  in  each  example,  and  what  differs. 

Example  HUSAB  Homogeneous,  unique  solution,  Archetype  B 
Archetype  B can  be  converted  to  the  homogeneous  system, 

—7x\  — 6x2  — 12z3  = 0 
5aq  + 5x2  + 7x3  = 0 
x\  + 4x3  = 0 

whose  augmented  matrix  row-reduces  to 

0 0 0 O' 

0 0 0 0 
.0  0 0 0. 

By  Theorem  HSC,  the  system  is  consistent,  and  so  the  computation  n — r = 
3 — 3 = 0 means  the  solution  set  contains  just  a single  solution.  Then,  this  lone 
solution  must  be  the  trivial  solution.  A 


Example  HISAA  Homogeneous,  infinite  solutions,  Archetype  A 
Archetype  A can  be  converted  to  the  homogeneous  system, 

X\  - x2  + 2x3  = 0 
2x\  + x2  + x3  = 0 
Xi  + x2  =0 


whose  augmented  matrix  row-reduces  to 

[U 

0 1 

o' 

0 

0 -1 

0 

_ 0 

0 0 

0_ 

By  Theorem  HSC,  the  system  is  consistent,  and  so  the  computation  n — r = 
3 — 2 = 1 means  the  solution  set  contains  one  free  variable  by  Theorem  FVCS,  and 
hence  has  infinitely  many  solutions.  We  can  describe  this  solution  set  using  the  free 
variable  x3, 


S = 


Xi 

X2 

A'3_ 


-x3 

X3 

. X3  _ 


x3  € C 


X\  = -X3,  x2  = X3 
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Geometrically,  these  are  points  in  three  dimensions  that  lie  on  a line  through  the 
origin.  A 

Example  HISAD  Homogeneous,  infinite  solutions,  Archetype  D 

Archetype  D (and  identically,  Archetype  E)  can  be  converted  to  the  homogeneous 

system, 


2xi  + X2  + 7x3  — 7x4  = 0 
— 3xi  + 4x2  ^ 5x3  — 6x4  = 0 
Xi  + X2  + 4x3  — 5x4  = 0 


whose  augmented  matrix  row-reduces  to 

jT]  0 3—2  0 

0 0 1—30 

0 0 0 0 0 


By  Theorem  HSC,  the  system  is  consistent,  and  so  the  computation  n — r = 
4 — 2 = 2 means  the  solution  set  contains  two  free  variables  by  Theorem  FVCS,  and 
hence  has  infinitely  many  solutions.  We  can  describe  this  solution  set  using  the  free 
variables  X3  and  X4, 


S = 


Xi  = —3X3  + 2x4,  = — X3  + 3x4 


X3,  X4  £ C 


A 

After  working  through  these  examples,  you  might  perform  the  same  computations 
for  the  slightly  larger  example,  Archetype  J. 

Notice  that  when  we  do  row  operations  on  the  augmented  matrix  of  a homogeneous 
system  of  linear  equations  the  last  column  of  the  matrix  is  all  zeros.  Any  one  of 
the  three  allowable  row  operations  will  convert  zeros  to  zeros  and  thus,  the  final 
column  of  the  matrix  in  reduced  row-echelon  form  will  also  be  all  zeros.  So  in  this 
case,  we  may  be  as  likely  to  reference  only  the  coefficient  matrix  and  presume  that 
we  remember  that  the  final  column  begins  with  zeros,  and  after  any  number  of  row 
operations  is  still  zero. 

Example  HISAD  suggests  the  following  theorem. 

Theorem  HMVEI  Homogeneous,  More  Variables  than  Equations,  Infinite  solutions 
Suppose  that  a homogeneous  system  of  linear  equations  has  m equations  and  n 
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variables  with  n > m.  Then  the  system  has  infinitely  many  solutions. 

Proof.  We  are  assuming  the  system  is  homogeneous,  so  Theorem  HSC  says  it  is 
consistent.  Then  the  hypothesis  that  n > m,  together  with  Theorem  CMVEI,  gives 
infinitely  many  solutions.  ■ 

Example  HUSAB  and  Example  HIS  A A are  concerned  with  homogeneous  systems 
where  n = m and  expose  a fundamental  distinction  between  the  two  examples.  One 
has  a unique  solution,  while  the  other  has  infinitely  many.  These  are  exactly  the 
only  two  possibilities  for  a homogeneous  system  and  illustrate  that  each  is  possible 
(unlike  the  case  when  n > m where  Theorem  HMVEI  tells  us  that  there  is  only  one 
possibility  for  a homogeneous  system). 

Subsection  NSM 
Null  Space  of  a Matrix 

The  set  of  solutions  to  a homogeneous  system  (which  by  Theorem  HSC  is  never 
empty)  is  of  enough  interest  to  warrant  its  own  name.  However,  we  define  it  as  a 
property  of  the  coefficient  matrix,  not  as  a property  of  some  system  of  equations. 

Definition  NSM  Null  Space  of  a Matrix 

The  null  space  of  a matrix  A,  denoted  Af(A),  is  the  set  of  all  the  vectors  that  are 
solutions  to  the  homogeneous  system  £S(A,  0).  □ 

In  the  Archetypes  (Archetypes)  each  example  that  is  a system  of  equations  also 
has  a corresponding  homogeneous  system  of  equations  listed,  and  several  sample 
solutions  are  given.  These  solutions  will  be  elements  of  the  null  space  of  the  coefficient 
matrix.  We  will  look  at  one  example. 

Example  NSEAI  Null  space  elements  of  Archetype  I 

The  write-up  for  Archetype  I lists  several  solutions  of  the  corresponding  homogeneous 
system.  Here  are  two,  written  as  solution  vectors.  We  can  say  that  they  are  in  the 
null  space  of  the  coefficient  matrix  for  the  system  of  equations  in  Archetype  I. 


' 3 ' 

—4 

0 

1 

-5 

-3 

-6 

y = 

-2 

0 

1 

0 

1 

1 

1 

x = 
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However,  the  vector 


z = 


1 

0 

0 

0 

0 

0 

2 


is  not  in  the  null  space,  since  it  is  not  a solution  to  the  homogeneous  system.  For 
example,  it  fails  to  even  make  the  first  equation  true.  A 


Here  are  two  (prototypical)  examples  of  the  computation  of  the  null  space  of  a 
matrix. 


Example  CNS1  Computing  a null  space,  no.  1 
Let  us  compute  the  null  space  of 


A = 


-2 

1 

2 


-1 

0 

2 


7 -3 

2 4 

-2  -1 


-8' 

9 

8 


which  we  write  as  AT  (A).  Translating  Definition  NSM,  we  simply  desire  to  solve  the 
homogeneous  system  CS(A , 0).  So  we  row-reduce  the  augmented  matrix  to  obtain 

'0  0 2 0 1 O' 

0 0-3  0 4 0 

.0  0 0 0 2 0. 

The  variables  (of  the  homogeneous  system)  £3  and  £5  are  free  (since  columns  1, 
2 and  4 are  pivot  columns),  so  we  arrange  the  equations  represented  by  the  matrix 
in  reduced  row-echelon  form  to 


X\  = -2£3  — £5 

£2  = 3£3  - 4£5 

£4  = — 2£5 


So  we  can  write  the  infinite  solution  set  as  sets  using  column  vectors, 


AT (A) 


'-2*3  - £5’ 

1 

3£3  - 4£5 

x3 

-2x5 

£3,  £5  £ C 

£5 

> 

A 


Example  CNS2  Computing  a null  space,  no.  2 
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Let  us  compute  the  null  space  of 

‘-4  6 
r_  -1  4 
° - 5 6 

_ 4 7 

which  we  write  as  Af(C).  Translating  Definition  NSM,  we  simply  desire  to  solve  the 
homogeneous  system  CS(C , 0).  So  we  row-reduce  the  augmented  matrix  to  obtain 

0 0 0 o' 

0 0 0 0 

0 0 0 0 

0 0 0 0 


r 

i 

7 

1 


There  are  no  free  variables  in  the  homogeneous  system  represented  by  the  row- 
reduced  matrix,  so  there  is  only  the  trivial  solution,  the  zero  vector,  0.  So  we  can 
write  the  (trivial)  solution  set  as 


A f{C)  = {0} 


'O' 

0 

0 


Reading  Questions 


1.  What  is  always  true  of  the  solution  set  for  a homogeneous  system  of  equations? 

2.  Suppose  a homogeneous  system  of  equations  has  13  variables  and  8 equations.  How 
many  solutions  will  it  have?  Why? 

3.  Describe,  using  only  words,  the  null  space  of  a matrix.  (So  in  particular,  do  not  use  any 
symbols.) 

Exercises 

CIO  Each  Archetype  (Archetypes)  that  is  a system  of  equations  has  a corresponding 
homogeneous  system  with  the  same  coefficient  matrix.  Compute  the  set  of  solutions  for 
each.  Notice  that  these  solution  sets  are  the  null  spaces  of  the  coefficient  matrices. 

Archetype  A,  Archetype  B,  Archetype  C,  Archetype  D/Archetype  E,  Archetype  F,  Archetype 
G/Archetype  H,  Archetype  I,  Archetype  J 

C20  Archetype  K and  Archetype  L are  simply  5x5  matrices  (i.e.  they  are  not  systems 
of  equations).  Compute  the  null  space  of  each  matrix. 

For  Exercises  C21-C23,  solve  the  given  homogeneous  linear  system.  Compare  your  results 
to  the  results  of  the  corresponding  exercise  in  Section  TSS. 
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C2V 

*1  + 4x2  + 3*3  —*4  = 0 
*1  — *2  + *3  + 2*4  = 0 
4*1  + *2  + 6*3  + 5*4  = 0 

C22f 

*1  — 2*2  + *3  — *4  = 0 
2*1  — 4*2  + *3  + *4  = 0 
*1  — 2*2  — 2*3  + 3*4  = 0 

C23f 

*1  — 2*2  + *3  — *4  = 0 
*1  + *2  + *3  — *4  = 0 
*1  + *3  — *4  = 0 


For  Exercises  C25-C27,  solve  the  given  homogeneous  linear  system.  Compare  your  results 
to  the  results  of  the  corresponding  exercise  in  Section  TSS. 

C25+ 


C26t 


C27t 


*i  + 2*2  + 3*3  = 0 
2*1  — *2  + *3  = 0 
3*1  + *2  + *3  = 0 
*2  + 2*3  = 0 

*1  + 2*2  + 3*3  = 0 
2*1  — *2  + *3  = 0 
3*1  + *2  + *3  = 0 
5*2  + 2*3  = 0 

*1  + 2*2  + 3*3  = 0 
2*1  — *2  + *3  = 0 
*1  — 8*2  — 7*3  = 0 

*2  + *3  = 0 
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C30'  Compute  the  null  space  of  the  matrix  A,  Af(A). 


C31f 


A = 


B = 


2 

4 

1 3 

-1 

-2 

-1  -1 

2 

4 

0 -3 

2 

4 

-1  -7 

latrix 

B,Af(B). 

'-6 

4 

-36  6 

2 

-1 

10  - 

-3 

2 

-18  3 

8' 

1 

4 

4 


M45  Without  doing  any  computations,  and  without  examining  any  solutions,  say  as 
much  as  possible  about  the  form  of  the  solution  set  for  corresponding  homogeneous  system 
of  equations  of  each  archetype  that  is  a system  of  equations. 


Archetype  A,  Archetype  B,  Archetype  C,  Archetype  D/Archetype  E,  Archetype  F,  Archetype 
G/Archetype  H,  Archetype  I,  Archetype  J 


For  Exercises  M50-M52  say  as  much  as  possible  about  each  system’s  solution  set.  Be 
sure  to  make  it  clear  which  theorems  you  are  using  to  reach  your  conclusions. 

M50f  A homogeneous  system  of  8 equations  in  8 variables. 

M5F  A homogeneous  system  of  8 equations  in  9 variables. 

M52^  A homogeneous  system  of  8 equations  in  7 variables. 

TICL  Prove  or  disprove:  A system  of  linear  equations  is  homogeneous  if  and  only  if  the 
system  has  the  zero  vector  as  a solution. 

Til1  Suppose  that  two  systems  of  linear  equations  are  equivalent.  Prove  that  if  the  first 
system  is  homogeneous,  then  the  second  system  is  homogeneous.  Notice  that  this  will 
allow  us  to  conclude  that  two  equivalent  systems  are  either  both  homogeneous  or  both  not 
homogeneous. 

T12  Give  an  alternate  proof  of  Theorem  HSC  that  uses  Theorem  RCLS. 

T2(F  Consider  the  homogeneous  system  of  linear  equations  CS(A,  0),  and  suppose  that 


~Ui 

"4«i" 

U2 

4«2 

u = 

U3 

is  one  solution  to  the  system  of  equations.  Prove  that  v = 

4«3 

-Un- 

Alin- 

solution  to  £S(A,  0). 


Section  NM 
Nonsingular  Matrices 


In  this  section  we  specialize  further  and  consider  matrices  with  equal  numbers  of 
rows  and  columns,  which  when  considered  as  coefficient  matrices  lead  to  systems 
with  equal  numbers  of  equations  and  variables.  We  will  see  in  the  second  half  of 
the  course  (Chapter  D,  Chapter  E,  Chapter  LT,  Chapter  R)  that  these  matrices  are 
especially  important. 

Subsection  NM 
Nonsingular  Matrices 

Our  theorems  will  now  establish  connections  between  systems  of  equations  (homo- 
geneous or  otherwise),  augmented  matrices  representing  those  systems,  coefficient 
matrices,  constant  vectors,  the  reduced  row-echelon  form  of  matrices  (augmented  and 
coefficient)  and  solution  sets.  Be  very  careful  in  your  reading,  writing  and  speaking 
about  systems  of  equations,  matrices  and  sets  of  vectors.  A system  of  equations  is 
not  a matrix,  a matrix  is  not  a solution  set,  and  a solution  set  is  not  a system  of 
equations.  Now  would  be  a great  time  to  review  the  discussion  about  speaking  and 
writing  mathematics  in  Proof  Technique  L. 

Definition  SQM  Square  Matrix 

A matrix  with  m rows  and  n columns  is  square  if  m = n.  In  this  case,  we  say  the 
matrix  has  size  n.  To  emphasize  the  situation  when  a matrix  is  not  square,  we  will 
call  it  rectangular.  □ 

We  can  now  present  one  of  the  central  definitions  of  linear  algebra. 

Definition  NM  Nonsingular  Matrix 

Suppose  A is  a square  matrix.  Suppose  further  that  the  solution  set  to  the  homoge- 
neous linear  system  of  equations  £<S(A,  0)  is  {0},  in  other  words,  the  system  has 
only  the  trivial  solution.  Then  we  say  that  A is  a nonsingular  matrix.  Otherwise 
we  say  A is  a singular  matrix.  □ 

We  can  investigate  whether  any  square  matrix  is  nonsingular  or  not,  no  matter  if 
the  matrix  is  derived  somehow  from  a system  of  equations  or  if  it  is  simply  a matrix. 
The  definition  says  that  to  perform  this  investigation  we  must  construct  a very 
specific  system  of  equations  (homogeneous,  with  the  matrix  as  the  coefficient  matrix) 
and  look  at  its  solution  set.  We  will  have  theorems  in  this  section  that  connect 
nonsingular  matrices  with  systems  of  equations,  creating  more  opportunities  for 
confusion.  Convince  yourself  now  of  two  observations,  (1)  we  can  decide  nonsingularity 
for  any  square  matrix,  and  (2)  the  determination  of  nonsingularity  involves  the 
solution  set  for  a certain  homogeneous  system  of  equations. 

Notice  that  it  makes  no  sense  to  call  a system  of  equations  nonsingular  (the  term 
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does  not  apply  to  a system  of  equations),  nor  does  it  make  any  sense  to  call  a 5 x 7 
matrix  singular  (the  matrix  is  not  square). 


Example  S A singular  matrix,  Archetype  A 

Example  HISAA  shows  that  the  coefficient  matrix  derived  from  Archetype  A,  specif- 
ically the  3x3  matrix, 


A = 


'1 

2 

1 


-1 

1 

1 


2‘ 

1 

0 


is  a singular  matrix  since  there  are  nontrivial  solutions  to  the  homogeneous  system 
£S(A,  0).  A 


Example  NM  A nonsingular  matrix,  Archetype  B 

Example  HUSAB  shows  that  the  coefficient  matrix  derived  from  Archetype  B, 
specifically  the  3x3  matrix, 


B = 


'-7 

5 

1 


-6 

5 

0 


-12" 

7 

4 


is  a nonsingular  matrix  since  the  homogeneous  system,  CS(B , 0),  has  only  the  trivial 
solution.  A 


Notice  that  we  will  not  discuss  Example  HIS  AD  as  being  a singular  or  nonsingular 
coefficient  matrix  since  the  matrix  is  not  square. 

The  next  theorem  combines  with  our  main  computational  technique  (row  reducing 
a matrix)  to  make  it  easy  to  recognize  a nonsingular  matrix.  But  first  a definition. 


Definition  IM  Identity  Matrix 

The  to  x to  identity  matrix,  Im,  is  defined  by 


Pm]  y 


1 i = j 

0 i j=-  j 


1 < f,  j < TO 


Example  IM  An  identity  matrix 
The  4x4  identity  matrix  is 

I 4 = 


0" 

0 

0 

1 


□ 


A 


Notice  that  an  identity  matrix  is  square,  and  in  reduced  row-echelon  form.  Also, 
every  column  is  a pivot  column,  and  every  possible  pivot  column  appears  once. 
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Theorem  NMRRI  Nonsingular  Matrices  Row  Reduce  to  the  Identity  matrix 
Suppose  that  A is  a square  matrix  and  B is  a row-equivalent  matrix  in  reduced 
row-echelon  form.  Then  A is  nonsingular  if  and  only  if  B is  the  identity  matrix. 

Proof.  (<=)  Suppose  B is  the  identity  matrix.  When  the  augmented  matrix  [A  | 0] 
is  row-reduced,  the  result  is  [B  I 0]  = [In  | 0].  The  number  of  nonzero  rows  is  equal 
to  the  number  of  variables  in  the  linear  system  of  equations  AS(A,  0),  so  n = r 
and  Theorem  FVCS  gives  n — r = 0 free  variables.  Thus,  the  homogeneous  system 
CS{A,  0)  has  just  one  solution,  which  must  be  the  trivial  solution.  This  is  exactly 
the  definition  of  a nonsingular  matrix  (Definition  NM). 

(=>)  If  A is  nonsingular,  then  the  homogeneous  system  CS(A,  0)  has  a unique 
solution,  and  has  no  free  variables  in  the  description  of  the  solution  set.  The  homo- 
geneous system  is  consistent  (Theorem  HSC)  so  Theorem  FVCS  applies  and  tells 
us  there  are  n — r free  variables.  Thus,  n — 7'  = 0,  and  so  n = r.  So  B has  n pivot 
columns  among  its  total  of  n columns.  This  is  enough  to  force  B to  be  the  n x n 
identity  matrix  In  (see  Exercise  NM.T12).  ■ 


Notice  that  since  this  theorem  is  an  equivalence  it  will  always  allow  us  to  determine 
if  a matrix  is  either  nonsingular  or  singular.  Here  are  two  examples  of  this,  continuing 
our  study  of  Archetype  A and  Archetype  B. 


Example  SRR  Singular  matrix,  row-reduced 

We  have  the  coefficient  matrix  for  Archetype  A and  a row-equivalent  matrix  B in 
reduced  row-echelon  form, 


A = 


A 

-1 

1 

2‘ 

1 

rref 

2 



.1 

1 

oj 

0 0 
0 0 


1 

-1 

0 


= B 


Since  B is  not  the  3x3  identity  matrix,  Theorem  NMRRI  tells  us  that  A is  a 
singular  matrix.  A 


Example  NSR  Nonsingular  matrix,  row-reduced 

We  have  the  coefficient  matrix  for  Archetype  B and  a row-equivalent  matrix  B in 
reduced  row-echelon  form, 


r-7  —6  —121 

r0 

o 

o 

A = 

5 5 7 

RREF 
> 

0 

0 0 

L 1 0 4 J 

. 0 

o b 

Since  B is  the  3x3  identity  matrix,  Theorem  NMRRI  tells  us  that  A is  a 
nonsingular  matrix.  A 
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Subsection  NSNM 

Null  Space  of  a Nonsingular  Matrix 

Nonsingular  matrices  and  their  null  spaces  are  intimately  related,  as  the  next  two 
examples  illustrate. 

Example  NSS  Null  space  of  a singular  matrix 

Given  the  singular  coefficient  matrix  from  Archetype  A,  the  null  space  is  the  set  of 
solutions  to  the  homogeneous  system  of  equations  CS(A,  0),  which  has  a solution 
set  and  null  space  constructed  in  Example  HISAA  as  an  infinite  set  of  vectors. 


A = 

1 

2 

-1 

1 

2 

1 

A r(A)  = { 

'-3:3' 

Z3 

*3  s C , 

1 

1 

0 

l 

. Z3  - 

J 

Example  NSNM  Null  space  of  a nonsingular  matrix 

Given  the  nonsingular  coefficient  matrix  from  Archetype  B,  the  solution  set  to  the 
homogeneous  system  CS(A,  0)  is  constructed  in  Example  HUSAB  and  contains  only 
the  trivial  solution,  so  the  null  space  of  A has  only  a single  element, 


A = 


'-7 

5 

1 


-6 

5 

0 


-12" 

7 

4 


A 


These  two  examples  illustrate  the  next  theorem,  which  is  another  equivalence. 

Theorem  NMTNS  Nonsingular  Matrices  have  Trivial  Null  Spaces 

Suppose  that  A is  a square  matrix.  Then  A is  nonsingular  if  and  only  if  the  null 

space  of  A is  the  set  containing  only  the  zero  vector,  i.e.  J\f(A)  = {0}. 


Proof.  The  null  space  of  a square  matrix , A,  is  equal  to  the  set  of  solutions  to  the 
homogeneous  system , LS{A,  0).  A matrix  is  nonsingular  if  and  only  if  the  set  of 
solutions  to  the  homogeneous  system , CS(A,  0),  has  only  a trivial  solution.  These 
two  observations  may  be  chained  together  to  construct  the  two  proofs  necessary  for 
each  half  of  this  theorem.  ■ 


The  next  theorem  pulls  a lot  of  big  ideas  together.  Theorem  NMUS  tells  us  that 
we  can  learn  much  about  solutions  to  a system  of  linear  equations  with  a square 
coefficient  matrix  by  just  examining  a similar  homogeneous  system. 

Theorem  NMUS  Nonsingular  Matrices  and  Unique  Solutions 

Suppose  that  A is  a square  matrix.  A is  a nonsingular  matrix  if  and  only  if  the 

system  CS(A,  b)  has  a unique  solution  for  every  choice  of  the  constant  vector  b. 
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Proof.  (<=)  The  hypothesis  for  this  half  of  the  proof  is  that  the  system  CS{A,  b) 
has  a unique  solution  for  every  choice  of  the  constant  vector  b.  We  will  make  a very 
specific  choice  for  b:  b = 0.  Then  we  know  that  the  system  CS{A,  0)  has  a unique 
solution.  But  this  is  precisely  the  definition  of  what  it  means  for  A to  be  nonsingular 
(Definition  NM).  That  almost  seems  too  easy!  Notice  that  we  have  not  used  the  full 
power  of  our  hypothesis,  but  there  is  nothing  that  says  we  must  use  a hypothesis  to 
its  fullest. 

(=>)  We  assume  that  A is  nonsingular  of  size  n x n,  so  we  know  there  is  a 
sequence  of  row  operations  that  will  convert  A into  the  identity  matrix  In  (Theorem 
NMRRI).  Form  the  augmented  matrix  A'  = [A  | b]  and  apply  this  same  sequence 
of  row  operations  to  A' . The  result  will  be  the  matrix  B'  = [In\  c],  which  is  in 
reduced  row-echelon  form  with  r = n.  Then  the  augmented  matrix  B'  represents  the 
(extremely  simple)  system  of  equations  Xi  = [c]?;,  1 < * < n.  The  vector  c is  clearly  a 
solution,  so  the  system  is  consistent  (Definition  CS).  With  a consistent  system,  we 
use  Theorem  FVCS  to  count  free  variables.  We  find  that  there  are  n ~ r = n — n = 0 
free  variables,  and  so  we  therefore  know  that  the  solution  is  unique.  (This  half  of 
the  proof  was  suggested  by  Asa  Scherer.)  ■ 

This  theorem  helps  to  explain  part  of  our  interest  in  nonsingular  matrices.  If  a 
matrix  is  nonsingular,  then  no  matter  what  vector  of  constants  we  pair  it  with,  using 
the  matrix  as  the  coefficient  matrix  will  always  yield  a linear  system  of  equations 
with  a solution,  and  the  solution  is  unique.  To  determine  if  a matrix  has  this  property 
(nonsingularity)  it  is  enough  to  just  solve  one  linear  system,  the  homogeneous  system 
with  the  matrix  as  coefficient  matrix  and  the  zero  vector  as  the  vector  of  constants 
(or  any  other  vector  of  constants,  see  Exercise  MM.T10). 

Formulating  the  negation  of  the  second  part  of  this  theorem  is  a good  exercise. 
A singular  matrix  has  the  property  that  for  some  value  of  the  vector  b,  the  system 
CS(A1  b)  does  not  have  a unique  solution  (which  means  that  it  has  no  solution  or 
infinitely  many  solutions).  We  will  be  able  to  say  more  about  this  case  later  (see  the 
discussion  following  Theorem  PSPHS). 

Square  matrices  that  are  nonsingular  have  a long  list  of  interesting  properties, 
which  we  will  start  to  catalog  in  the  following,  recurring,  theorem.  Of  course,  singular 
matrices  will  then  have  all  of  the  opposite  properties.  The  following  theorem  is  a list 
of  equivalences. 

We  want  to  understand  just  what  is  involved  with  understanding  and  proving 
a theorem  that  says  several  conditions  are  equivalent.  So  have  a look  at  Proof 
Technique  ME  before  studying  the  first  in  this  series  of  theorems. 

Theorem  NME1  Nonsingular  Matrix  Equivalences,  Round  1 
Suppose  that  A is  a square  matrix.  The  following  are  equivalent. 

1.  A is  nonsingular. 

2.  A row-reduces  to  the  identity  matrix. 
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3.  The  null  space  of  A contains  only  the  zero  vector,  J\f(A)  = {0}. 

4-  The  linear  system  CS(A , b)  has  a unique  solution  for  every  possible  choice  of 

b. 

Proof.  The  statement  that  A is  nonsingular  is  equivalent  to  each  of  the  subsequent 
statements  by,  in  turn,  Theorem  NMRRI,  Theorem  NMTNS  and  Theorem  NMUS. 
So  the  statement  of  this  theorem  is  just  a convenient  way  to  organize  all  these  results. 


Finally,  you  may  have  wondered  why  we  refer  to  a matrix  as  nonsingular  when 
it  creates  systems  of  equations  with  single  solutions  (Theorem  NMUS)!  I have 
wondered  the  same  thing.  We  will  have  an  opportunity  to  address  this  when  we  get 
to  Theorem  SMZD.  Can  you  wait  that  long? 

Reading  Questions 

1.  In  your  own  words  state  the  definition  of  a nonsingular  matrix. 

2.  What  is  the  easiest  way  to  recognize  if  a square  matrix  is  nonsingular  or  not? 

3.  Suppose  we  have  a system  of  equations  and  its  coefficient  matrix  is  nonsingular.  What 
can  you  say  about  the  solution  set  for  this  system? 

Exercises 


In  Exercises  C30-C33  determine  if  the  matrix  is  nonsingular  or  singular.  Give  reasons  for 
your  answer. 

cso1 

‘-3  1 2 8 ' 

2 0 3 4 

1 2 7-4 

.5  -12  0 . 

C3U 

‘ 2 3 14' 

1110 
-12  3 5 

.1  2 13. 

032+ 

'9  3 2 4 ' 

5-613 
4 13-5 
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css1 


■-1 

2 

0 

3' 

1 

-3 

-2 

4 

-2 

0 

4 

3 

-3 

1 

-2 

3 

C40  Each  of  the  archetypes  below  is  a system  of  equations  with  a square  coefficient 
matrix,  or  is  itself  a square  matrix.  Determine  if  these  matrices  are  nonsingular,  or  singular. 
Comment  on  the  null  space  of  each  matrix. 


Archetype  A,  Archetype  B,  Archetype  F,  Archetype  K,  Archetype  L 
C50'  Find  the  null  space  of  the  matrix  E below. 


- 2 

1 

-1 

-9' 

E = 

2 

2 

-6 

-6 

1 

2 

-8 

0 

-1 

2 

-12 

12 

M30f  Let  A be  the  coefficient  matrix  of  the  system  of  equations  below.  Is  A nonsingular 
or  singular?  Explain  what  you  could  infer  about  the  solution  set  for  the  system  based  only 
on  what  you  have  learned  about  A being  singular  or  nonsingular. 

— *1  + 5x2  = —8 
— 2*i  + 5x2  + 5*3  + 2*4  = 9 
— 3*1  — *2  + 3*3  +*4=3 
7*i  + 6*2  + 5*3  + *4  = 30 


For  Exercises  M51-M52  say  as  much  as  possible  about  each  system’s  solution  set.  Be 
sure  to  make  it  clear  which  theorems  you  are  using  to  reach  your  conclusions. 

M5F  6 equations  in  6 variables,  singular  coefficient  matrix. 

M52^  A system  with  a nonsingular  coefficient  matrix,  not  homogeneous. 

T10'  Suppose  that  A is  a square  matrix,  and  B is  a matrix  in  reduced  row-echelon  form 
that  is  row-equivalent  to  A.  Prove  that  if  A is  singular,  then  the  last  row  of  B is  a zero  row. 

T12  Using  (Definition  RREF)  and  (Definition  IM)  carefully,  give  a proof  of  the  following 
equivalence:  A is  a square  matrix  in  reduced  row-echelon  form  where  every  column  is  a 
pivot  column  if  and  only  if  A is  the  identity  matrix. 

T3(F  Suppose  that  A is  a nonsingular  matrix  and  A is  row-equivalent  to  the  matrix  B. 
Prove  that  B is  nonsingular. 

T31 1 Suppose  that  A is  a square  matrix  of  size  n x n and  that  we  know  there  is  a single 
vector  b 6 Cn  such  that  the  system  £<S(A,  b)  has  a unique  solution.  Prove  that  A is  a 
nonsingular  matrix.  (Notice  that  this  is  very  similar  to  Theorem  NMUS,  but  is  not  exactly 
the  same.) 
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T9(F  Provide  an  alternative  for  the  second  half  of  the  proof  of  Theorem  NMUS,  without 
appealing  to  properties  of  the  reduced  row-echelon  form  of  the  coefficient  matrix.  In  other 
words,  prove  that  if  A is  nonsingular,  then  £5  (A,  b)  has  a unique  solution  for  every  choice 
of  the  constant  vector  b.  Construct  this  proof  without  using  Theorem  REMEF  or  Theorem 
RREFU. 


Chapter  V 
Vectors 


We  have  worked  extensively  in  the  last  chapter  with  matrices,  and  some  with  vectors. 
In  this  chapter  we  will  develop  the  properties  of  vectors,  while  preparing  to  study 
vector  spaces  (Chapter  VS).  Initially  we  will  depart  from  our  study  of  systems 
of  linear  equations,  but  in  Section  LC  we  will  forge  a connection  between  linear 
combinations  and  systems  of  linear  equations  in  Theorem  SLSLC.  This  connection 
will  allow  us  to  understand  systems  of  linear  equations  at  a higher  level,  while 
consequently  discussing  them  less  frequently. 


Section  VO 
Vector  Operations 

In  this  section  we  define  some  new  operations  involving  vectors,  and  collect  some 
basic  properties  of  these  operations.  Begin  by  recalling  our  definition  of  a column 
vector  as  an  ordered  list  of  complex  numbers,  written  vertically  (Definition  CV). 
The  collection  of  all  possible  vectors  of  a fixed  size  is  a commonly  used  set,  so  we 
start  with  its  definition. 

Subsection  CV 
Column  Vectors 

Definition  VSCV  Vector  Space  of  Column  Vectors 

The  vector  space  Cm  is  the  set  of  all  column  vectors  (Definition  CV)  of  size  m with 
entries  from  the  set  of  complex  numbers,  C.  □ 

When  a set  similar  to  this  is  defined  using  only  column  vectors  where  all  the 
entries  are  from  the  real  numbers,  it  is  written  as  Rm  and  is  known  as  Euclidean 
m-space. 
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The  term  vector  is  used  in  a variety  of  different  ways.  We  have  defined  it  as 
an  ordered  list  written  vertically.  It  could  simply  be  an  ordered  list  of  numbers, 
and  perhaps  written  as  (2,  3,  —1,  6).  Or  it  could  be  interpreted  as  a point  in  m 
dimensions,  such  as  (3,  4,  —2)  representing  a point  in  three  dimensions  relative  to  x, 
y and  z axes.  With  an  interpretation  as  a point,  we  can  construct  an  arrow  from  the 
origin  to  the  point  which  is  consistent  with  the  notion  that  a vector  has  direction 
and  magnitude. 

All  of  these  ideas  can  be  shown  to  be  related  and  equivalent,  so  keep  that  in  mind 
as  you  connect  the  ideas  of  this  course  with  ideas  from  other  disciplines.  For  now, 
we  will  stick  with  the  idea  that  a vector  is  just  a list  of  numbers,  in  some  particular 
order. 

Subsection  VEASM 

Vector  Equality,  Addition,  Scalar  Multiplication 

We  start  our  study  of  this  set  by  first  defining  what  it  means  for  two  vectors  to  be 
the  same. 

Definition  CVE  Column  Vector  Equality 

Suppose  that  u,  v £ Cm.  Then  u and  v are  equal,  written  u = v if 


Now  this  may  seem  like  a silly  (or  even  stupid)  thing  to  say  so  carefully.  Of 
course  two  vectors  are  equal  if  they  are  equal  for  each  corresponding  entry!  Well, 
this  is  not  as  silly  as  it  appears.  We  will  see  a few  occasions  later  where  the  obvious 
definition  is  not  the  right  one.  And  besides,  in  doing  mathematics  we  need  to  be  very 
careful  about  making  all  the  necessary  definitions  and  making  them  unambiguous. 
And  we  have  done  that  here. 

Notice  now  that  the  symbol  “=”  is  now  doing  triple-duty.  We  know  from  our 
earlier  education  what  it  means  for  two  numbers  (real  or  complex)  to  be  equal,  and 
we  take  this  for  granted.  In  Definition  SE  we  defined  what  it  meant  for  two  sets 
to  be  equal.  Now  we  have  defined  what  it  means  for  two  vectors  to  be  equal,  and 
that  definition  builds  on  our  definition  for  when  two  numbers  are  equal  when  we 
use  the  condition  iq  = Vi  for  all  1 < i < m.  So  think  carefully  about  your  objects 
when  you  see  an  equal  sign  and  think  about  just  which  notion  of  equality  you  have 
encountered.  This  will  be  especially  important  when  you  are  asked  to  construct 
proofs  whose  conclusion  states  that  two  objects  are  equal.  If  you  have  an  electronic 
copy  of  the  book,  such  as  the  PDF  version,  searching  on  “Definition  CVE”  can  be 
an  instructive  exercise.  See  how  often,  and  where,  the  definition  is  employed. 

OK,  let  us  do  an  example  of  vector  equality  that  begins  to  hint  at  the  utility  of 
this  definition. 


1 < i < m 


□ 
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Example  VESE  Vector  equality  for  a system  of  equations 
Consider  the  system  of  linear  equations  in  Archetype  B, 

— 7xi  — 6x2  — 12x3  = — 33 
5xi  + 5x2  + 7x3  = 24 
xi  + 4x3  = 5 

Note  the  use  of  three  equals  signs  — each  indicates  an  equality  of  numbers  (the 
linear  expressions  are  numbers  when  we  evaluate  them  with  fixed  values  of  the 
variable  quantities).  Now  write  the  vector  equality, 


7xi  — 6x2  — 12x3' 

-33- 

5xi  + 5x2  + 7x3 

= 

24 

xi  + 4x3 

5 

By  Definition  CVE,  this  single  equality  (of  two  column  vectors)  translates  into  three 
simultaneous  equalities  of  numbers  that  form  the  system  of  equations.  So  with  this 
new  notion  of  vector  equality  we  can  become  less  reliant  on  referring  to  systems  of 
simultaneous  equations.  There  is  more  to  vector  equality  than  just  this,  but  this  is  a 
good  example  for  starters  and  we  will  develop  it  further.  A 

We  will  now  define  two  operations  on  the  set  Cm.  By  this  we  mean  well-defined 
procedures  that  somehow  convert  vectors  into  other  vectors.  Here  are  two  of  the 
most  basic  definitions  of  the  entire  course. 


Definition  CVA  Column  Vector  Addition 

Suppose  that  u,  v g Cm.  The  sum  of  u and  v is  the  vector  u + v defined  by 
[u  + v].  = [u]4  + [v]i  1<*<TO 


□ 


So  vector  addition  takes  two  vectors  of  the  same  size  and  combines  them  (in  a 
natural  way!)  to  create  a new  vector  of  the  same  size.  Notice  that  this  definition 
is  required,  even  if  we  agree  that  this  is  the  obvious,  right,  natural  or  correct  way 
to  do  it.  Notice  too  that  the  symbol  “+’  is  being  recycled.  We  all  know  how  to  add 
numbers , but  now  we  have  the  same  symbol  extended  to  double-duty  and  we  use 
it  to  indicate  how  to  add  two  new  objects,  vectors.  And  this  definition  of  our  new 
meaning  is  built  on  our  previous  meaning  of  addition  via  the  expressions  iq  + Uj. 
Think  about  your  objects,  especially  when  doing  proofs.  Vector  addition  is  easy,  here 
is  an  example  from  C4. 

Example  VA  Addition  of  two  vectors  in  C4 


■ 2 ■ 

-r 

-3 

5 

u = 

4 

V = 

2 

2 

-7 
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then 


' 2 ' 

'-r 

p + (-l)-l 

' 1 ' 

—3 

+ 

5 

-3  + 5 

2 

4 

2 

4 + 2 

6 

2 

—7 

.2+ (-7). 

-5 

A 


Our  second  operation  takes  two  objects  of  different  types,  specifically  a number 
and  a vector,  and  combines  them  to  create  another  vector.  In  this  context  we  call  a 
number  a scalar  in  order  to  emphasize  that  it  is  not  a vector. 


Definition  CVSM  Column  Vector  Scalar  Multiplication 

Suppose  u £ Cm  and  a £ C,  then  the  scalar  multiple  of  u by  a is  the  vector  au 
defined  by 


Hi  = « Mi 


1 < i < m 


□ 


Notice  that  we  are  doing  a kind  of  multiplication  here,  but  we  are  defining  a new 
type,  perhaps  in  what  appears  to  be  a natural  way.  We  use  juxtaposition  (smashing 
two  symbols  together  side-by-side)  to  denote  this  operation  rather  than  using  a 
symbol  like  we  did  with  vector  addition.  So  this  can  be  another  source  of  confusion. 
When  two  symbols  are  next  to  each  other,  are  we  doing  regular  old  multiplication, 
the  kind  we  have  done  for  years,  or  are  we  doing  scalar  vector  multiplication,  the 
operation  we  just  defined?  Think  about  your  objects  — if  the  first  object  is  a scalar, 
and  the  second  is  a vector,  then  it  must  be  that  we  are  doing  our  new  operation, 
and  the  result  of  this  operation  will  be  another  vector. 

Notice  how  consistency  in  notation  can  be  an  aid  here.  If  we  write  scalars  as 
lower  case  Greek  letters  from  the  start  of  the  alphabet  (such  as  a,  ft,  . . . ) and  write 
vectors  in  bold  Latin  letters  from  the  end  of  the  alphabet  (u,  v,  . . . ),  then  we  have 
some  hints  about  what  type  of  objects  we  are  working  with.  This  can  be  a blessing 
and  a curse,  since  when  we  go  read  another  book  about  linear  algebra,  or  read  an 
application  in  another  discipline  (physics,  economics,  . . . ) the  types  of  notation 
employed  may  be  very  different  and  hence  unfamiliar. 

Again,  computationally,  vector  scalar  multiplication  is  very  easy. 


Example  CVSM  Scalar  multiplication  in  C5 
If 


' 3 ' 
1 

-2 

4 

-1 


§vo 


Beezer:  A First  Course  in  Linear  Algebra 


78 


and  a = 6,  then 


' 3 ' 

r 6(3)  i 

' 18  ' 

1 

6(1) 

6 

-2 

= 

6(-2) 

= 

-12 

4 

6(4) 

24 

-1 

l_6(-l)J 

-6 

Subsection  VSP 
Vector  Space  Properties 

With  definitions  of  vector  addition  and  scalar  multiplication  we  can  state,  and  prove, 
several  properties  of  each  operation,  and  some  properties  that  involve  their  interplay. 
We  now  collect  ten  of  them  here  for  later  reference. 

Theorem  VSPCV  Vector  Space  Properties  of  Column  Vectors 

Suppose  that  Cm  is  the  set  of  column  vectors  of  size  m (Definition  VSCV)  with 

addition  and  scalar  multiplication  as  defined  in  Definition  CVA  and  Definition  CVSM. 

Then 

• ACC  Additive  Closure,  Column  Vectors 
If  u,  v € Cm,  then  u + v £ Cm. 

• SCC  Scalar  Closure,  Column  Vectors 
If  a € C and  u £ Cm,  then  au  £ Cm. 

• CC  Commutativity,  Column  Vectors 
If  u,  v £ Cm,  then  u + v = v + u. 

• AAC  Additive  Associativity,  Column  Vectors 

If  u,  v,  w £ Cm,  then  u + (v  + w)  = (u  + v)  + w. 

• ZC  Zero  Vector,  Column  Vectors 

There  is  a vector,  0,  called  the  zero  vector,  such  that  u + 0 = u for  all  u £ Cm. 

• AIC  Additive  Inverses,  Column  Vectors 

If  u £ Cm,  then  there  exists  a vector  — u £ Cm  so  that  u + (— u)  = 0. 

• SMAC  Scalar  Multiplication  Associativity,  Column  Vectors 
If  a,  (3  £ C and  u £ Cm,  then  a(/3u)  = (a/3)u. 

• DVAC  Distributivity  across  Vector  Addition,  Column  Vectors 
If  a £ C and  u,  v £ Cm,  then  a(u  + v)  = au  + av. 
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• DSAC  Distributivity  across  Scalar  Addition,  Column  Vectors 
If  a,  f3  £ C and  u £ Cm,  then  ( a + /3)u  = au  + j3u. 

• OC  One,  Column  Vectors 
If  u £ Cm,  then  lu  = u. 


Proof.  While  some  of  these  properties  seem  very  obvious,  they  all  require  proof. 
However,  the  proofs  are  not  very  interesting,  and  border  on  tedious.  We  will  prove 
one  version  of  distributivity  very  carefully,  and  you  can  test  your  proof-building 
skills  on  some  of  the  others.  We  need  to  establish  an  equality,  so  we  will  do  so  by 
beginning  with  one  side  of  the  equality,  apply  various  definitions  and  theorems  (listed 
to  the  right  of  each  step)  to  massage  the  expression  from  the  left  into  the  expression 
on  the  right.  Here  we  go  with  a proof  of  Property  DSAC. 

For  1 < i < m, 


[(a  + /3)u]i  = {a + 13)  [u]i 
= « [u]  t + 0 [u].- 

= M;  + [M* 

= [au  + /3u]j 


Definition  CVSM 
Property  DCN 
Definition  CVSM 
Definition  CVA 


Since  the  individual  components  of  the  vectors  (a  + j3)  u and  au  + /3u  are  equal 
for  all  i,  1 < i < m,  Definition  CVE  tells  us  the  vectors  are  equal.  ■ 

Many  of  the  conclusions  of  our  theorems  can  be  characterized  as  “identities,” 
especially  when  we  are  establishing  basic  properties  of  operations  such  as  those  in 
this  section.  Most  of  the  properties  listed  in  Theorem  VSPCV  are  examples.  So 
some  advice  about  the  style  we  use  for  proving  identities  is  appropriate  right  now. 
Have  a look  at  Proof  Technique  PI. 

Be  careful  with  the  notion  of  the  vector  — u.  This  is  a vector  that  we  add  to  u 
so  that  the  result  is  the  particular  vector  0.  This  is  basically  a property  of  vector 
addition.  It  happens  that  we  can  compute  — u using  the  other  operation,  scalar 
multiplication.  We  can  prove  this  directly  by  writing  that 

[— = - Mi  = (-1)  [u]*  = [(— l)u] . 

We  will  see  later  how  to  derive  this  property  as  a consequence  of  several  of  the  ten 
properties  listed  in  Theorem  VSPCV. 

Similarly,  we  will  often  write  something  you  would  immediately  recognize  as 
“vector  subtraction.”  This  could  be  placed  on  a firm  theoretical  foundation  — as  you 
can  do  yourself  with  Exercise  VO.T30. 

A final  note.  Property  AAC  implies  that  we  do  not  have  to  be  careful  about  how 
we  “parenthesize”  the  addition  of  vectors.  In  other  words,  there  is  nothing  to  be 
gained  by  writing  (u  -f  v)  + (w  + (x  + y))  rather  than  u + v-f-w  + x + y,  since  we 
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get  the  same  result  no  matter  which  order  we  choose  to  perform  the  four  additions. 
So  we  will  not  be  careful  about  using  parentheses  this  way. 

Reading  Questions 


1.  Where  have  you  seen  vectors  used  before  in  other  courses?  How  were  they  different? 

2.  In  words  only,  when  are  two  vectors  equal? 

3.  Perform  the  following  computation  with  vector  operations 


Exercises 

CICd  Compute 


T 

Y 

5 

0 

+ (-3) 

6 

5 

' 2 ' 

' 1 ' 

'-r 

-3 

2 

3 

4 

+ (-2) 

-5 

+ 

0 

1 

2 

1 

0 

4 

2 

ClE  Solve  the  given  vector  equation  for  x,  or  explain  why  no  solution  exists: 


' 1 ' 

Y 

'll' 

2 

+ 4 

0 

= 

6 

-1 

X 

17 

C12'  Solve  the  given  vector  equation  for  a,  or  explain  why  no  solution  exists: 


' i ' 

3 

'-l' 

a 

2 

+ 4 

4 

= 

0 

-1 

2 

4 

ClY  Solve  the  given  vector  equation  for  a,  or  explain  why  no  solution  exists: 


3 

6 

' 0 ' 

2 

+ 

1 

= 

-3 

-2 

2 

6 

C14d  Find  a and  (3  that  solve  the  vector  equation. 


a 


1 

0 


3 

2 


C15'  Find  a and  /3  that  solve  the  vector  equation. 


2 

1 


5 

0 


a 
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T05'  Provide  reasons  (mostly  vector  space  properties)  as  justification  for  each  of  the 

seven  steps  of  the  following  proof. 

Theorem  For  any  vectors  u,  v,  w G Cm,  if  u + v = u + w,  then  v = w. 

Proof:  Let  u,  v,  w G Cm,  and  suppose  u + v = u + w. 

v = 0 + v 

= (-u  + u)+v 
= -u+  (u  + v) 

= — u + (u  + w) 

= (— u + u)  + w 
= 0 + w 
= w 

106^  Provide  reasons  (mostly  vector  space  properties)  as  justification  for  each  of  the  six 

steps  of  the  following  proof. 

Theorem  For  any  vector  u G Cm,  Ou  = 0. 

Proof:  Let  uGC. 


0 = 0u  + (-Ou) 

= (0  + 0)u  + (— Ou) 
= (Ou  + Ou)  + (-Ou) 
= 0u+  (0u+  (-Ou)) 
= Ou  + O 
= Ou 


T07^  Provide  reasons  (mostly  vector  space  properties)  as  justification  for  each  of  the  six 
steps  of  the  following  proof. 

Theorem  For  any  scalar  c,  cO  = 0. 

Proof:  Let  c be  an  arbitrary  scalar. 


0 = cO  + (— cO) 

= c(0  + 0)  + (-cO) 
= (cO  + cO)  + (— cO) 
= cO  + (cO  + (— cO)) 
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= c0  + 0 
= cO 


T13^  Prove  Property  CC  of  Theorem  VSPCV.  Write  your  proof  in  the  style  of  the  proof 
of  Property  DSAC  given  in  this  section. 

T17  Prove  Property  SMAC  of  Theorem  VSPCV.  Write  your  proof  in  the  style  of  the 
proof  of  Property  DSAC  given  in  this  section. 

T18  Prove  Property  DVAC  of  Theorem  VSPCV.  Write  your  proof  in  the  style  of  the 
proof  of  Property  DSAC  given  in  this  section. 

Exercises  T30,  T31  and  T32  are  about  making  a careful  definition  of  “vector  subtraction”. 
T30  Suppose  u and  v are  two  vectors  in  Cm.  Define  a new  operation,  called  “sub- 
traction,” as  the  new  vector  denoted  u — v and  defined  by 


Prove  that  we  can  express  the  subtraction  of  two  vectors  in  terms  of  our  two  basic 
operations.  More  precisely,  prove  that  u — v = u + (— l)v.  So  in  a sense,  subtraction  is 
not  something  new  and  different,  but  is  just  a convenience.  Mimic  the  style  of  similar 
proofs  in  this  section. 

T31  Prove,  by  giving  counterexamples,  that  vector  subtraction  is  not  commutative 
and  not  associative. 

T32  Prove  that  vector  subtraction  obeys  a distributive  property.  Specifically,  prove 
that  a (u  — v)  = au  — av. 

Can  you  give  two  different  proofs?  Distinguish  your  two  proofs  by  using  the  alternate 
descriptions  of  vector  subtraction  provided  by  Exercise  VO.T30. 


1 < i < m 


Section  LC 
Linear  Combinations 

In  Section  VO  we  defined  vector  addition  and  scalar  multiplication.  These  two 
operations  combine  nicely  to  give  us  a construction  known  as  a linear  combination, 
a construct  that  we  will  work  with  throughout  this  course. 


Subsection  LC 
Linear  Combinations 


Definition  LCCV  Linear  Combination  of  Column  Vectors 

Given  n vectors  ui,  U2,  U3,  . . . , un  from  Cm  and  n scalars  a±,  a-i , 03,  . . . , an,  their 

linear  combination  is  the  vector 


OlUi  + 02u2  + O3U3  + • • • + anu„ 


□ 


So  this  definition  takes  an  equal  number  of  scalars  and  vectors,  combines  them 
using  our  two  new  operations  (scalar  multiplication  and  vector  addition)  and  creates 
a single  brand-new  vector,  of  the  same  size  as  the  original  vectors.  When  a definition 
or  theorem  employs  a linear  combination,  think  about  the  nature  of  the  objects  that 
go  into  its  creation  (lists  of  scalars  and  vectors),  and  the  type  of  object  that  results 
(a  single  vector).  Computationally,  a linear  combination  is  pretty  easy. 

Example  TLC  Two  linear  combinations  in  C6 
Suppose  that 


and 


ctl 

= 1 

Ot2 

—4 

' 2 ' 

' 6 ' 

4 

3 

Ui  = 

-3 

1 

" = 

0 

-2 

2 

1 

9 

4 

OL  3 = 

2 

04  = - 

-1 

'-5' 

' 3 ' 

2 

2 

1 

-5 

U3  = 

1 

u4  = 

7 

-3 

1 

0 

3 

then  their  linear  combination  is 


' 2 ' 

' 6 ' 

5" 

' 3 ' 

4 

3 

2 

2 

aiUi  + a2U2  + a3u3  + a4u4  = (1) 

-3 

1 

+ (-4) 

0 

—2 

+ (2) 

1 

1 

+ (-l) 

-5 

7 

2 

1 

-3 

1 

9 

4 

0 

3 
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2 

-24 

-10 

-3 

-35 

4 

-12 

4 

-2 

-6 

-3 

0 

2 

5 

4 

1 

+ 

8 

+ 

2 

+ 

-7 

— 

4 

2 

-4 

-6 

-1 

-9 

9 

-16 

0 

-3 

-10 

A different  linear  combination,  of  the  same  set  of  vectors,  can  be  formed  with 
different  scalars.  Take 


Pi  = 3 P2  = 0 


Pz  = 5 


Pa  = 1 


and  form  the  linear  combination 


/3iui  + /32u2  + P3U3  + Pau4 


' 2 ' 

' 6 ' 

'-5' 

' 3 ' 

4 

3 

2 

2 

-3 

1 

+ (0) 

0 

-2 

+ (5) 

1 

1 

+ (-l) 

-5 

7 

2 

1 

-3 

1 

9 

4 

0 

3 

6 

0 

-25 

-3 

-22 

12 

0 

10 

-2 

20 

-9 

0 

5 

5 

1 

3 

+ 

0 

+ 

5 

+ 

-7 

— 

1 

6 

0 

-15 

-1 

-10 

27 

0 

0 

-3 

24 

Notice  how  we  could  keep  our  set  of  vectors  fixed,  and  use  different  sets  of  scalars 
to  construct  different  vectors.  You  might  build  a few  new  linear  combinations  of 
Ui,  u2,  U3,  114  right  now.  We  will  be  right  here  when  you  get  back.  What  vectors 
were  you  able  to  create?  Do  you  think  you  could  create  the  vector  w with  a “suitable” 
choice  of  four  scalars? 


' 13  ' 

15 

5 

w “ -17 
2 

. 25  . 

Do  you  think  you  could  create  any  possible  vector  from  C6  by  choosing  the  proper 
scalars?  These  last  two  questions  are  very  fundamental,  and  time  spent  considering 
them  now  will  prove  beneficial  later.  A 

Our  next  two  examples  are  key  ones,  and  a discussion  about  decompositions  is 
timely.  Have  a look  at  Proof  Technique  DC  before  studying  the  next  two  examples. 

Example  ABLC  Archetype  B as  a linear  combination 
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In  this  example  we  will  rewrite  Archetype  B in  the  language  of  vectors,  vector 
equality  and  linear  combinations.  In  Example  YESE  we  wrote  the  system  of  to  = 3 
equations  as  the  vector  equality 


‘—7x1  — 6x2  — 12x3" 

'—33' 

5xi  + 5x2  + 7x3 

= 

24 

Xi  + 4x3 

5 

Now  we  will  bust  up  the  linear  expressions  on  the  left,  first  using  vector  addition, 


"— 7xi" 

'-6x2' 

'— 12x3‘ 

■-33- 

5xi 

+ 

5x2 

+ 

7x3 

— 

24 

. X\  . 

. 0x2  . 

. 4x3  . 

5 

Now  we  can  rewrite  each  of  these  vectors  as  a scalar  multiple  of  a fixed  vector, 
where  the  scalar  is  one  of  the  unknown  variables,  converting  the  left-hand  side  into 
a linear  combination 


Xi 

T 10 

1 

+ x2 

'-6' 

5 

+ x3 

'—12' 

7 



--33- 

24 

1 

0 

4 

5 

We  can  now  interpret  the  problem  of  solving  the  system  of  equations  as  determin- 
ing values  for  the  scalar  multiples  that  make  the  vector  equation  true.  In  the  analysis 
of  Archetype  B,  we  were  able  to  determine  that  it  had  only  one  solution.  A quick 
way  to  see  this  is  to  row-reduce  the  coefficient  matrix  to  the  3x3  identity  matrix 
and  apply  Theorem  NMRRI  to  determine  that  the  coefficient  matrix  is  nonsingular. 
Then  Theorem  NMUS  tells  us  that  the  system  of  equations  has  a unique  solution. 
This  solution  is 

X\  = —3  X2  — 5 £3  = 2 

So,  in  the  context  of  this  example,  we  can  express  the  fact  that  these  values  of 
the  variables  are  a solution  by  writing  the  linear  combination, 


--T 

'-6' 

'—12' 

--33- 

(-3) 

5 

. 1 . 

+ (5) 

5 

0 

+ (2) 

7 

4 

24 

5 

Furthermore,  these  are  the  only  three  scalars  that  will  accomplish  this  equality, 
since  they  come  from  a unique  solution. 

Notice  how  the  three  vectors  in  this  example  are  the  columns  of  the  coefficient 
matrix  of  the  system  of  equations.  This  is  our  first  hint  of  the  important  interplay 
between  the  vectors  that  form  the  columns  of  a matrix,  and  the  matrix  itself.  A 

With  any  discussion  of  Archetype  A or  Archetype  B we  should  be  sure  to  contrast 
with  the  other. 

Example  AALC  Archetype  A as  a linear  combination 
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As  a vector  equality,  Archetype  A can  be  written  as 


~X\  - X2  + 2x3 

T 

2x\  + X2  + x3 

= 

8 

X\  + X2 

5 

Now  bust  up  the  linear  expressions  on  the  left,  first  using  vector  addition, 


Xi 

-X2 

2x3 

T 

2x\ 

+ 

X2 

+ 

X3 

= 

8 

. x\  . 

. x2  . 

0x3. 

L5j 

Rewrite  each  of  these  vectors  as  a scalar  multiple  of  a fixed  vector,  where  the 
scalar  is  one  of  the  unknown  variables,  converting  the  left-hand  side  into  a linear 
combination 


T 

--r 

-‘2 

m 

Xi 

2 

.1 

+ X2 

1 

1 

+ X3 

1 

0 

1 

Ol  00 

1 

Row-reducing  the  augmented  matrix  for  Archetype  A leads  to  the  conclusion 
that  the  system  is  consistent  and  has  free  variables,  hence  infinitely  many  solutions. 
So  for  example,  the  two  solutions 

x\  = 2 X2  = 3 X3  = 1 

X\  = 3 X2  = 2 X3  = 0 

can  be  used  together  to  say  that, 


(2) 

T 

2 

+ (3) 

--r 

1 

+ (i) 

'2' 

1 

— 

T 

8 

= (3) 

T 

2 

+ (2) 

-r 

1 

+ (0) 

-2 

1 

1 

1 

0 

5 

1 

1 

0 

Ignore  the  middle  of  this  equation,  and  move  all  the  terms  to  the  left-hand  side, 


(2) 

T 

2 

+ (3) 

--r 

1 

+ (i) 

'2‘ 

1 

+ ( 

-3) 

T 

2 

+ (-2) 

-r 

1 

+ (-o) 

'2' 

1 

— 

'O' 

0 

1 

1 

0 

1 

1 

0 

0 

Regrouping  gives 


T 

-r 

'2 

■O' 

(-1) 

2 

.1 

+ (1) 

1 

1 

+ (i) 

1 

0 

= 

0 

0 

Notice  that  these  three  vectors  are  the  columns  of  the  coefficient  matrix  for  the 
system  of  equations  in  Archetype  A.  This  equality  says  there  is  a linear  combination 
of  those  columns  that  equals  the  vector  of  all  zeros.  Give  it  some  thought,  but  this 
says  that 

X\  = —1  X2  — 1 X3  = 1 

is  a nontrivial  solution  to  the  homogeneous  system  of  equations  with  the  coefficient 
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matrix  for  the  original  system  in  Archetype  A.  In  particular,  this  demonstrates  that 
this  coefficient  matrix  is  singular.  A 

There  is  a lot  going  on  in  the  last  two  examples.  Come  back  to  them  in  a while  and 
make  some  connections  with  the  intervening  material.  For  now,  we  will  summarize 
and  explain  some  of  this  behavior  with  a theorem. 

Theorem  SLSLC  Solutions  to  Linear  Systems  are  Linear  Combinations 
Denote  the  columns  of  the  m x n matrix  A as  the  vectors  A1;  A2,  A3,  . . . , A„. 
Then  x £ Cn  is  a solution  to  the  linear  system  of  equations  CS(A,  b)  if  and  only  if 
b equals  the  linear  combination  of  the  columns  of  A formed  with  the  entries  of  x, 

Mi  Ai  + [x] 2 A2  + [x] 3 A3  H b [x]n  A„  = b 

Proof.  The  proof  of  this  theorem  is  as  much  about  a change  in  notation  as  it  is 
about  making  logical  deductions.  Write  the  system  of  equations  £S(A,  b)  as 

anzi  + a i2x2  + ai3x3  + • • • + a\nxn  = b± 

a2i Xi  + a22x2  + 023^3  H 1-  a2nXn  = b2 

a3iXi  + a32x2  + a33x3  H b a3nxn  = b3 


T Qjm2%2  T arn3X3  “b  ‘ * * "b  amnXn  — bm 


Notice  then  that  the  entry  of  the  coefficient  matrix  A in  row  i and  column  j has 
two  names:  a^  as  the  coefficient  of  Xj  in  equation  i of  the  system  and  [Aj]);  as  the 
*-th  entry  of  the  column  vector  in  column  j of  the  coefficient  matrix  A.  Likewise, 
entry  i of  b has  two  names:  bi  from  the  linear  system  and  [b]  • as  an  entry  of  a 
vector.  Our  theorem  is  an  equivalence  (Proof  Technique  E)  so  we  need  to  prove  both 
“directions.” 

(^=)  Suppose  we  have  the  vector  equality  between  b and  the  linear  combination 
of  the  columns  of  A.  Then  for  1 < * < to, 


^ = H 


= [Mi  Ai 


[xj  9 A2  + [x]  3 A3 


J2  ^-2 
■]2- 


Tx1,a2] 


:m.3a3i, 


= Mi  [Ai]i  + M2  [a2]i  + M3  [Aa]i  + • • • + M„  Mlj 

= Ml  ail  + M2  ai2  + M3  ai3  H b Mn 

= Oil  Ml  + M2  + M3  “I Am  Mn 

This  says  that  the  entries  of  x form  a solution  to  equation  i of  £<S(A,  b)  for  all 
1 < i < to,  in  other  words,  x is  a solution  to  £S (A,  b). 

(=>)  Suppose  now  that  x is  a solution  to  the  linear  system  CS(A,  b).  Then  for 


Definition  CV 
Hypothesis 
Definition  CVA 
Definition  CVSM 
Definition  CV 
Property  CMCN 
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all  1 < i < m, 

Wi  = bi 

= aa  [x]x  + ai2  [x]2  + ai3  [x]3  H b ain  [x], 

= Mi  aa  + [x]2  ai2  + [x]3  ai3  H b [x]n  ar 


= [x]1[A1]i  + [x 
= [Mi  Ai]i  + [[> 


[Aa],- 


[Aa],  ■ 


Mr*  [An 
ffxL  A. 


Definition  CV 
Hypothesis 
Property  CMCN 
Definition  CV 
Definition  CVSM 
Definition  CVA 


= [Mi  Ai  + Ma  A2  + Ms  A3  + • • • + M„  A„] . 

So  the  entries  of  the  vector  b,  and  the  entries  of  the  vector  that  is  the  linear 
combination  of  the  columns  of  A,  agree  for  all  1 < i < m.  By  Definition  CVE  we  see 
that  the  two  vectors  are  equal,  as  desired.  ■ 


In  other  words,  this  theorem  tells  us  that  solutions  to  systems  of  equations  are 
linear  combinations  of  the  n column  vectors  of  the  coefficient  matrix  (A? ) which 
yield  the  constant  vector  b.  Or  said  another  way,  a solution  to  a system  of  equations 
CS(A , b)  is  an  answer  to  the  question  “How  can  I form  the  vector  b as  a linear 
combination  of  the  columns  of  A?”  Look  through  the  archetypes  that  are  systems 
of  equations  and  examine  a few  of  the  advertised  solutions.  In  each  case  use  the 
solution  to  form  a linear  combination  of  the  columns  of  the  coefficient  matrix  and 
verify  that  the  result  equals  the  constant  vector  (see  Exercise  LC.C21). 


Subsection  VFSS 

Vector  Form  of  Solution  Sets 

We  have  written  solutions  to  systems  of  equations  as  column  vectors.  For  example 
Archetype  B has  the  solution  X\  = —3,  x2  = 5,  X3  = 2 which  we  write  as 


*1' 

--3- 

X = 

X2 

= 

5 

X3. 

2 

Now,  we  will  use  column  vectors  and  linear  combinations  to  express  all  of  the 
solutions  to  a linear  system  of  equations  in  a compact  and  understandable  way. 
First,  here  are  two  examples  that  will  motivate  our  next  theorem.  This  is  a valuable 
technique,  almost  the  equal  of  row-reducing  a matrix,  so  be  sure  you  get  comfortable 
with  it  over  the  course  of  this  section. 

Example  VFSAD  Vector  form  of  solutions  for  Archetype  D 

Archetype  D is  a linear  system  of  3 equations  in  4 variables.  Row-reducing  the 

augmented  matrix  yields 

■0  0 3 -2  4' 

0 0 1-30 

0 0 0 0 0 
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and  we  see  r = 2 pivot  columns.  Also,  D = {1,  2}  so  the  dependent  variables  are 
then  X\  and  x2.  F = {3,  4,  5}  so  the  two  free  variables  are  x3  and  X4.  We  will 
express  a generic  solution  for  the  system  by  two  slightly  different  methods,  though 
both  arrive  at  the  same  conclusion. 

First,  we  will  decompose  (Proof  Technique  DC)  a solution  vector.  Rearranging 
each  equation  represented  in  the  row-reduced  form  of  the  augmented  matrix  by 
solving  for  the  dependent  variable  in  each  row  yields  the  vector  equality, 


~x{ 

'4  — 3a’3  + 20:4' 

X2 

— x3  + 3:c4 

X3 

x3 

_X  4 

X4 

Now  we  will  use  the  definitions  of  column  vector  addition  and  scalar  multiplication 
to  express  this  vector  as  a linear  combination, 


4' 

'-3x3' 

"2^4" 

0 

0 

+ 

—x3 

x3 

+ 

3a4 

0 

0 

0 

. x4  _ 

x3 


3' 

'2' 

-1 

+ X4 

3 

1 

0 

0 

1 

Definition  CVA 


Definition  CVSM 


We  will  develop  the  same  linear  combination  a bit  quicker,  using  three  steps. 
While  the  method  above  is  instructive,  the  method  below  will  be  our  preferred 
approach. 

Step  1.  Write  the  vector  of  variables  as  a fixed  vector,  plus  a linear  combination 
of  n — r vectors,  using  the  free  variables  as  the  scalars. 


Xi 

X = 

X2 

X3 

= 

+ X3 

+ X4 

X4 

Step  2.  Use  0’s  and  l’s  to  ensure  equality  for  the  entries  of  the  vectors  with 
indices  in  F (corresponding  to  the  free  variables). 


Xi 

X = 

x2 

X3 

= 

0 

+ x3 

1 

+ X4 

0 

X4 

0 

0 

1 

Step  3.  For  each  dependent  variable,  use  the  augmented  matrix  to  formulate  an 
equation  expressing  the  dependent  variable  as  a constant  plus  multiples  of  the  free 
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variables.  Convert  this  equation  into  entries  of  the  vectors  that  ensure  equality  for 
each  dependent  variable,  one  at  a time. 


x\  = 4 — 3x3  + 2x4 


X2  = 0 — IX3  + 3X4 


"xi" 

'4' 

—3' 

’2" 

X = 

X2 

X3 

= 

0 

+ x3 

1 

+ X4 

0 

XA. 

0. 

. 0 . 

_1_ 

'x{ 

'4' 

—3' 

’2" 

X2 

0 

+ x3 

-1 

+ X4 

3 

X = 

X3 

— 

0 

1 

0 

X4 

0 

0 

1 

This  final  form  of  a typical  solution  is  especially  pleasing  and  useful.  For  example, 
we  can  build  solutions  quickly  by  choosing  values  for  our  free  variables,  and  then 
compute  a linear  combination.  Such  as 


X3  = 2,  X4  = —5 


or, 


X3  = 1,  X4  = 3 


~X\ 

’4’ 

'-3' 

’2" 

’—12" 

X = 

X2 

X3 

= 

0 

0 

+ (2) 

-1 

1 

+ (-5) 

3 

0 

= 

-17 

2 

.x4. 

0 

0 

1 

-5 

~x{ 

’4’ 

"-3" 

’2’ 

T 

X = 

X2 

X3 

= 

0 

0 

+ (1) 

-1 

1 

+ (3) 

3 

0 

= 

8 

1 

XA. 

0 

0 

1 

3 

You  will  find  the  second  solution  listed  in  the  write-up  for  Archetype  D,  and  you 
might  check  the  first  solution  by  substituting  it  back  into  the  original  equations. 

While  this  form  is  useful  for  quickly  creating  solutions,  it  is  even  better  because 
it  tells  us  exactly  what  every  solution  looks  like.  We  know  the  solution  set  is  infinite, 


which  is  pretty  big,  but  now  we  can  say  that  a solution  is  some  multiple  of 


'-3' 

-1 

1 

0 


plus  a multiple  of 


"2" 

3 

0 

1 


plus  the  fixed  vector 


'4' 

0 

0 

0 


. Period.  So  it  only  takes  us  three 


vectors  to  describe  the  entire  infinite  solution  set,  provided  we  also  agree  on  how  to 
combine  the  three  vectors  into  a linear  combination.  A 


This  is  such  an  important  and  fundamental  technique,  we  will  do  another  example. 
Example  VFS  Vector  form  of  solutions 

Consider  a linear  system  of  to  = 5 equations  in  n = 7 variables,  having  the  augmented 
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matrix  A. 


■ 2 

1 

-1 

-2 

2 

1 

5 

21  ' 

1 

1 

-3 

1 

1 

1 

2 

—5 

A = 

1 

2 

-8 

5 

1 

1 

-6 

-15 

3 

3 

-9 

3 

6 

5 

2 

-24 

-2 

-1 

1 

2 

1 

1 

-9 

—30. 

Row-reducing  we  obtain  the  matrix 

rs 

0 

2 

-3 

0 

0 

9 

15 

0 

p 

-5 

4 

0 

0 

-8 

-10 

B = 

0 

0 

0 

0 

0 

0 

-6 

11 

0 

0 

0 

0 

0 

0 

7 

-21 

. 0 

0 

0 

0 

0 

0 

0 

0 

and  we  see  r = 4 pivot  columns.  Also,  D = {1,  2,  5,  6}  so  the  dependent  variables 
are  then  xi,  Xi , x$,  and  Xq.  F = {3,  4,  7,  8}  so  the  n — r = 3 free  variables  are 
x3,  X4  and  X7.  We  will  express  a generic  solution  for  the  system  by  two  different 
methods:  both  a decomposition  and  a construction. 

First,  we  will  decompose  (Proof  Technique  DC)  a solution  vector.  Rearranging 
each  equation  represented  in  the  row-reduced  form  of  the  augmented  matrix  by 
solving  for  the  dependent  variable  in  each  row  yields  the  vector  equality, 


X\ 

15  — 2xz  + 3x4  — 9x7 

Xl 

— 10  + 5X3  ~ 4X4  + 8x7 

x3 

x3 

X4 

— 

X4 

X5 

11  + 6x7 

x6 

-21  - 7x7 

XT 

x7 

Now  we  will  use  the  definitions  of  column  vector  addition  and  scalar  multiplication 
to  decompose  this  generic  solution  vector  as  a linear  combination, 


' 15  ' 

— 2x3 

3X4 

— 9X7 

-10 

5x3 

— 4X4 

8x7 

0 

x3 

0 

0 

0 

+ 

0 

+ 

X4 

+ 

0 

11 

0 

0 

6x7 

-21 

0 

0 

— 7X7 

0 

0 

0 

x7 

Definition  CVA 
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15 

-2 

3 

-9 

-10 

5 

-4 

8 

0 

1 

0 

0 

0 

+ £3 

0 

"b  X4 

1 

+ x^ 

0 

11 

0 

0 

6 

-21 

0 

0 

-7 

0 

0 

0 

1 

Definition  CVSM 


We  will  now  develop  the  same  linear  combination  a bit  quicker,  using  three  steps. 
While  the  method  above  is  instructive,  the  method  below  will  be  our  preferred 
approach. 

Step  1.  Write  the  vector  of  variables  as  a fixed  vector,  plus  a linear  combination 
of  n — r vectors,  using  the  free  variables  as  the  scalars. 


Step  2.  Use  0’s  and  l’s  to  ensure  equality  for  the  entries  of  the  vectors  with 
indices  in  F (corresponding  to  the  free  variables). 


Xi 

x2 

X3 

0 

1 

0 

0 

X4 

= 

0 

+ X3 

0 

+ X4 

1 

+ x^ 

0 

x5 

x6 

x7 

0 

0 

0 

1 

Step  3.  For  each  dependent  variable,  use  the  augmented  matrix  to  formulate  an 
equation  expressing  the  dependent  variable  as  a constant  plus  multiples  of  the  free 
variables.  Convert  this  equation  into  entries  of  the  vectors  that  ensure  equality  for 
each  dependent  variable,  one  at  a time. 


Xi  = 15  — 2x3  + 3a’4  — 9x7  =>■ 


Xl 

15 

-2 

3 

-9 

X2 

X3 

0 

1 

0 

0 

X\ 

= 

0 

+ X’3 

0 

+ X4 

1 

+ X7 

0 

x5 

x6 

x7 

0 

0 

0 

1 

X2  = — 10  + 5x3  — 4x4  + 8x7  => 
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Xi 

' 15  ' 

'-2" 

' 3 ' 

'-9' 

X2 

-10 

5 

-4 

8 

X3 

0 

1 

0 

0 

X4 

= 

0 

+ x3 

0 

+ X4 

1 

+ £7 

0 

x5 

x6 

_X7_ 

0 

0 

0 

1 

£5  = 11  + 6x7  => 


Xi 

' 15  ' 

'-2' 

' 3 ' 

'-9' 

X2 

-10 

5 

-4 

8 

X3 

0 

1 

0 

0 

X = 

X4 

= 

0 

+ x3 

0 

+ X4 

1 

+ X7 

0 

x5 

11 

0 

0 

6 

xe 

_X7_ 

0 

0 

0 

1 

xq  — —21  - 

- 7x7 

=> 

X\ 

' 15  ' 

'-2' 

' 3 ' 

'-9' 

X2 

-10 

5 

-4 

8 

x3 

0 

1 

0 

0 

X = 

X4 

= 

0 

+ X3 

0 

+ x4 

1 

+ x7 

0 

x$ 

11 

0 

0 

6 

xe 

-21 

0 

0 

-7 

_X7_ 

0 

0 

0 

1 

This  final  form  of  a typical  solution  is  especially  pleasing  and  useful.  For  example, 
we  can  build  solutions  quickly  by  choosing  values  for  our  free  variables,  and  then 
compute  a linear  combination.  For  example 

X3  = 2,  X4  = —4,  X7  — 3 => 


X\ 

' 15  ' 

'-2' 

' 3 ' 

'-9' 

'-28' 

X2 

-10 

5 

-4 

8 

40 

X3 

0 

1 

0 

0 

2 

X4 

= 

0 

+ (2) 

0 

+ (-4) 

1 

+ (3) 

0 

= 

-4 

xe 

11 

0 

0 

6 

29 

x6 

-21 

0 

0 

-7 

-42 

_X7_ 

0 

0 

0 

1 

3 

or  perhaps, 

X3  = 5,  X4  = 2,  X7  = 1 => 
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Xi 

15 

-2 

3 

-9 

2 

X2 

-10 

5 

-4 

8 

15 

X3 

0 

1 

0 

0 

5 

= 

0 

+ (5) 

0 

+ (2) 

1 

+ (1) 

0 

= 

2 

11 

0 

0 

6 

17 

x6 

-21 

0 

0 

-7 

-28 

x7 

0 

0 

0 

1 

1 

or  even, 

X3  = 0,  Xi  = 0,  x7  = 0 => 


Xi 

15 

-2 

3 

-9 

15 

X2 

-10 

5 

-4 

8 

-10 

X3 

0 

1 

0 

0 

0 

X4 

= 

0 

+ (0) 

0 

+ (0) 

1 

+ (0) 

0 

= 

0 

x5 

11 

0 

0 

6 

11 

Xq 

-21 

0 

0 

-7 

-21 

X7 

0 

0 

0 

1 

0 

So  we  can  compactly  express  all  of  the  solutions  to  this  linear  system  with  just  4 
fixed  vectors,  provided  we  agree  how  to  combine  them  in  a linear  combinations  to 
create  solution  vectors. 

Suppose  you  were  told  that  the  vector  w below  was  a solution  to  this  system  of 
equations.  Could  you  turn  the  problem  around  and  write  w as  a linear  combination 
of  the  four  vectors  c,  u1:  u2,  u3?  (See  Exercise  LC.M11.) 


'100' 

' 15  ' 

'-2' 

' 3 ' 

'-9' 

-75 

-10 

5 

-4 

8 

7 

0 

1 

0 

0 

9 

c = 

0 

Ui  = 

0 

u2  = 

1 

u3  = 

0 

-37 

11 

0 

0 

6 

35 

-21 

0 

0 

-7 

-8 

0 

0 

0 

1 

Did  you  think  a few  weeks  ago  that  you  could  so  quickly  and  easily  list  all  the 
solutions  to  a linear  system  of  5 equations  in  7 variables? 

We  will  now  formalize  the  last  two  (important)  examples  as  a theorem.  The 
statement  of  this  theorem  is  a bit  scary,  and  the  proof  is  scarier.  For  now,  be  sure  to 
convice  yourself,  by  working  through  the  examples  and  exercises,  that  the  statement 
just  describes  the  procedure  of  the  two  immediately  previous  examples. 

Theorem  VFSLS  Vector  Form  of  Solutions  to  Linear  Systems 
Suppose  that  [A  | b]  is  the  augmented  matrix  for  a consistent  linear  system  CS(A,  b) 
of  m equations  in  n variables.  Let  B be  a row-equivalent  m x (n  + 1)  matrix 
in  reduced  row- echelon  form.  Suppose  that  B has  r pivot  columns,  with  indices 
D = {d\,  c?2,  d3,  . . . , dr},  while  the  n — r non-pivot  columns  have  indices  in  F = 
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{/l,  f‘2 , fz,  • • • , fn-r,  n + 1}.  Define  vectors  c,  Uj,  1 < j < n - r of  size  n by 


[c]  |°  ifisF 

1 \[SL,n+l  ifl&D,l  = dk 
f 1 if  i e F,  i = fj 

[ujli  = S 0 ifi€F,i^fj  . 

ifi&F>,i  = dk 

Then  the  set  of  solutions  to  the  system  of  equations  £S(A , b)  is 
S = { C + QiiUi  + CK2u2  + CI3U3  + • • • + CXn—r\ln—r  \ OL\,  Oil,  a 3,  . . . , an-r  £ C} 


Proof.  First,  CS{A,  b)  is  equivalent  to  the  linear  system  of  equations  that  has  the 
matrix  B as  its  augmented  matrix  (Theorem  REMES),  so  we  need  only  show  that  S 
is  the  solution  set  for  the  system  with  B as  its  augmented  matrix.  The  conclusion  of 
this  theorem  is  that  the  solution  set  is  equal  to  the  set  S , so  we  will  apply  Definition 
SE. 

We  begin  by  showing  that  every  element  of  S is  indeed  a solution  to  the  system. 
Let  «i,  ci2j  &3,  . . . , an-r  be  one  choice  of  the  scalars  used  to  describe  elements  of 
S.  So  an  arbitrary  element  of  S,  which  we  will  consider  as  a proposed  solution  is 

x = c + aqui  + CI2U2  T CI3U3  + • • • + Q„_ru„_r 

When  r + 1 < £ < m,  row  £ of  the  matrix  B is  a zero  row,  so  the  equation 
represented  by  that  row  is  always  true,  no  matter  which  solution  vector  we  propose. 
So  concentrate  on  rows  representing  equations  1 < £ < r.  We  evaluate  equation  £ of 
the  system  represented  by  B with  the  proposed  solution  vector  x and  refer  to  the 
value  of  the  left-hand  side  of  the  equation  as  /3(, 

ft  = [-S]^1  M-l  + [B\e  2 [x]  2 + [B\(3  [x]3  + • • • + [B]in  [x]n 

Since  [B]ed,  = 0 for  all  1 < i < r,  except  that  [B]idf  = 1,  we  see  that  fit  simplifies 
to 

fit  = [XU  + [B\tfl  M fl  + [B}th  [x] f2  + [B\th  [x] /3  H f [■ B)lfn_r  [x] /n_r 

Notice  that  for  1 < i < n — r 

[x] f_  = [c] f,  -(-  a i [ui] h + a2  [u2] f.-\ ha,  [u.,] f,-\ h a„_r  [un_r] f. 

= 0 + ai(0)  + 02(0)  + •■■-(-  Q!i(l)  + ’ • ’ + Q!n_r(0) 

— OL% 

So  fit  simplifies  further,  and  we  expand  the  first  term 

fit  = Md*  + al  + iB\tf2  a2  + [B]tf3  «3  d b [B]tfn_r  an-r 

= [c  + OqUi  + a2U2  + 0<3U3  + ' ' ' + Oln-r Un_r]d<  + 
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[B\lfl  011  + + [-®W3  “3  -| *"  lB]efn-r  an~r 

= iC\de  + «1  [UlW  + a2  [U2]df  + «3  [U3]de  + ■ ■ ■ + OLn-r  [un_r]^  + 

lB\efi  ai  + [B\lh  «2  + lB]ef3  a3  H h [B]tfn_r  an-r 

= iB]e,n+i  + 

or(-  + «2(-  [B]tf2)  + a3{-  [B]ef3)  H 1-  an_r(-  [B]^_r)+ 

[B]^  oil  + [B\if2  a2  + [B]ef3  a3  H h [B\efri_r  an-r 

= [B]^jn+ 1 

So  /3|  began  as  the  left-hand  side  of  equation  £ of  the  system  represented  by  B 
and  we  now  know  it  equals  [B\t  n+1,  the  constant  term  for  equation  l of  this  system. 
So  the  arbitrarily  chosen  vector  from  S makes  every  equation  of  the  system  true, 
and  therefore  is  a solution  to  the  system.  So  all  the  elements  of  S are  solutions  to 
the  system. 

For  the  second  half  of  the  proof,  assume  that  x is  a solution  vector  for  the  system 
having  B as  its  augmented  matrix.  For  convenience  and  clarity,  denote  the  entries  of 
x by  Xi,  in  other  words,  Xi  = [x]r  We  desire  to  show  that  this  solution  vector  is  also 
an  element  of  the  set  S.  Begin  with  the  observation  that  a solution  vector’s  entries 
makes  equation  l of  the  system  true  for  all  1 < f < to, 

[B]qi  + [B]^2  xz  + [B]^3  x3  + ■ ■ ■ + [B]e,n  xn  = \B\^  n+1 

When  £ < r,  the  pivot  columns  of  B have  zero  entries  in  row  £ with  the  exception 
of  column  d(,  which  will  contain  a 1.  So  for  1 < £ < r,  equation  £ simplifies  to 

lxde  + [B](j1  Xf1  + [B\ej2  Xf2  + [B]e j3  Xf3  + • • • + [B]ejn_r  Xfn_r  = [B]^  n+1 

This  allows  us  to  write, 

[XU  = Xde 

= \B\t.,n+ 1 ~ [B]^^  xh  ~ lB\ej2  xh  ~ [B]ej3  xh  — ■ • ■ — [B]^jn_r  xf„_r 

= \C\de  + Xfl  [Ul]df  + xh  [U2U  + Xh  [«3 ]de  + • • • + Xfn-r  [U n-r\dt 
= [c  + xfi  U1  + xf2U2  + XfsU3  + ■ ■ ■ + xfn-run-r]de 

This  tells  us  that  the  entries  of  the  solution  vector  x corresponding  to  dependent 
variables  (indices  in  D ),  are  equal  to  those  of  a vector  in  the  set  S.  We  still  need  to 
check  the  other  entries  of  the  solution  vector  x corresponding  to  the  free  variables 
(indices  in  F)  to  see  if  they  are  equal  to  the  entries  of  the  same  vector  in  the  set  S. 
To  this  end,  suppose  i £ F and  i = fj.  Then 

Mi  = Xi  = Xfj 

= 0 + 0xfl  + C )Xf2  + 0 Xf3  H 1-  Oxfj_1  + 1 x^  + c )Xfj+1  H h 0 xfn_r 

= Mi  + xfi  [U!]j  + Xf2  [U2]4  + Xf3  [U3]J  + ...+Xfj  [UjJj  + ' • • + xfn_r  [Un-rli 
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= [c  + X/jUl  + x/2u2  H Xfn_run-r). 

So  entries  of  x and  c + x^Ui  + Xf2 U2  + • • • + Xfn_r u„_r  are  equal  and  therefore 
by  Definition  CVE  they  are  equal  vectors.  Since  x/1}  Xf2.  Xf3 , . . . , Xfn_r  are  scalars, 
this  shows  us  that  x qualifies  for  membership  in  S.  So  the  set  S contains  all  of  the 
solutions  to  the  system.  ■ 

Note  that  both  halves  of  the  proof  of  Theorem  VFSLS  indicate  that  = [x]j  . 
In  other  words,  the  arbitrary  scalars,  a*,  in  the  description  of  the  set  S actually  have 
more  meaning  — they  are  the  values  of  the  free  variables  [x]  y. , 1 < i < n — r.  So  we 
will  often  exploit  this  observation  in  our  descriptions  of  solution  sets. 

Theorem  VFSLS  formalizes  what  happened  in  the  three  steps  of  Example  VFSAD. 
The  theorem  will  be  useful  in  proving  other  theorems,  and  it  it  is  useful  since  it  tells 
us  an  exact  procedure  for  simply  describing  an  infinite  solution  set.  We  could  program 
a computer  to  implement  it,  once  we  have  the  augmented  matrix  row-reduced  and 
have  checked  that  the  system  is  consistent.  By  Knuth’s  definition,  this  completes 
our  conversion  of  linear  equation  solving  from  art  into  science.  Notice  that  it  even 
applies  (but  is  overkill)  in  the  case  of  a unique  solution.  However,  as  a practical 
matter,  I prefer  the  three-step  process  of  Example  VFSAD  when  I need  to  describe 
an  infinite  solution  set.  So  let  us  practice  some  more,  but  with  a bigger  example. 

Example  VFSAI  Vector  form  of  solutions  for  Archetype  I 

Archetype  I is  a linear  system  of  m = 4 equations  in  n = 7 variables.  Row-reducing 
the  augmented  matrix  yields 

0 4 0 0 2 1 -3  4' 

0 0^01-352 

00002-6  61 
.000  000  0 0. 

and  we  see  r = 3 pivot  columns,  with  indices  D = {1,  3,  4}.  So  the  r = 3 dependent 
variables  are  X\,  X3,  X4.  The  non-pivot  columns  have  indices  in  F = {2,  5,  6,  7,  8}, 
so  the  n — r = 4 free  variables  are  X2,  £5,  x§,  X7. 

Step  1.  Write  the  vector  of  variables  (x)  as  a fixed  vector  (c),  plus  a linear 
combination  of  n — r = 4 vectors  (ui,  112,  113,  114),  using  the  free  variables  as  the 
scalars. 


Step  2.  For  each  free  variable,  use  0’s  and  l’s  to  ensure  equality  for  the  corre- 
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sponding  entry  of  the  vectors.  Take  note  of  the  pattern  of  0’s  and  l’s  at  this  stage, 
because  this  is  the  best  look  you  will  have  at  it.  We  will  state  an  important  theorem 
in  the  next  section  and  the  proof  will  essentially  rely  on  this  observation. 


Xi 

x2 

0 

1 

0 

0 

0 

X3 

X4 

= 

+ x2 

+ X5 

+ X6 

+ X7 

x5 

0 

0 

1 

0 

0 

x6 

0 

0 

0 

1 

0 

_X7_ 

0 

0 

0 

0 

1 

Step  3.  For  each  dependent  variable,  use  the  augmented  matrix  to  formulate  an 
equation  expressing  the  dependent  variable  as  a constant  plus  multiples  of  the  free 
variables.  Convert  this  equation  into  entries  of  the  vectors  that  ensure  equality  for 
each  dependent  variable,  one  at  a time. 

Xi  = 4 — 4^2  ^ 2x5  — 1^6  + 3x’7  => 


Xi 

4" 

'-4' 

~-2 

'3' 

x2 

0 

1 

0 

0 

0 

x3 

X4 

= 

+ x2 

+ x$ 

+ Xq 

+ x^ 

x5 

0 

0 

1 

0 

0 

x6 

0 

0 

0 

1 

0 

®7_ 

0 

0 

0 

0 

1 

X3  = 2 + 0x2  — X5  + 3X6  — 5X7  => 


X\ 

4" 

'-4' 

-2' 

’-1 

' 3 ' 

x2 

0 

1 

0 

0 

0 

x3 

2 

0 

-1 

3 

-5 

X4 

= 

+ x2 

+ x5 

+ xe 

+ X7 

x5 

0 

0 

1 

0 

0 

x6 

0 

0 

0 

1 

0 

_X7_ 

0 

0 

0 

0 

1 

X4  = 1 + 0X2  — 2x5  + 6x’6  — 6x7  => 


Xl 

4" 

'-4' 

-2" 

'-1' 

' 3 ' 

X2 

0 

1 

0 

0 

0 

X3 

2 

0 

-1 

3 

-5 

X4 

= 

1 

+ x2 

0 

+ x5 

-2 

+ x6 

6 

+ X7 

-6 

x5 

0 

0 

1 

0 

0 

x6 

0 

0 

0 

1 

0 

®7_ 

0 

0 

0 

0 

1 

§LC 


Beezer:  A First  Course  in  Linear  Algebra 


99 


We  can  now  use  this  final  expression  to  quickly  build  solutions  to  the  system. 
You  might  try  to  recreate  each  of  the  solutions  listed  in  the  write-up  for  Archetype 
I.  (Hint:  look  at  the  values  of  the  free  variables  in  each  solution,  and  notice  that  the 
vector  c has  0’s  in  these  locations.) 

Even  better,  we  have  a description  of  the  infinite  solution  set,  based  on  just  5 
vectors,  which  we  combine  in  linear  combinations  to  produce  solutions. 

Whenever  we  discuss  Archetype  I you  know  that  is  your  cue  to  go  work  through 
Archetype  J by  yourself.  Remember  to  take  note  of  the  0/1  pattern  at  the  conclusion 
of  Step  2.  Have  fun  — we  won’t  go  anywhere  while  you’re  away.  A 

This  technique  is  so  important,  that  we  will  do  one  more  example.  However,  an 
important  distinction  will  be  that  this  system  is  homogeneous. 

Example  VFSAL  Vector  form  of  solutions  for  Archetype  L 
Archetype  L is  presented  simply  as  the  5x5  matrix 


'-2 

-1 

-2 

-4 

4 ' 

-6 

-5 

-4 

-4 

6 

10 

7 

7 

10 

-13 

-7 

-5 

-6 

-9 

10 

-4 

-3 

-4 

-6 

6 

We  will  employ  this  matrix  here  as  the  coefficient  matrix  of  a homogeneous 
system  and  reference  this  matrix  as  L.  So  we  are  solving  the  homogeneous  system 
CS(L,  0)  having  m = 5 equations  in  n = 5 variables.  If  we  built  the  augmented 
matrix,  we  would  add  a sixth  column  to  L containing  all  zeros.  As  we  did  row 
operations,  this  sixth  column  would  remain  all  zeros.  So  instead  we  will  row-reduce 
the  coefficient  matrix,  and  mentally  remember  the  missing  sixth  column  of  zeros. 
This  row-reduced  matrix  is 

-0001  -2 
0 0 0-22 
0 0 0 2 -1 

0 0 0 0 0 

_ 0 0 0 0 0 _ 

and  we  see  r = 3 pivot  columns,  with  indices  D = {1,  2,  3}.  So  the  r = 3 dependent 

variables  are  x-\ , X2,  £3.  The  non-pivot  columns  have  indices  F = {4,  5},  so  the 
n — r = 2 free  variables  are  X4,  x$.  Notice  that  if  we  had  included  the  all-zero  vector 
of  constants  to  form  the  augmented  matrix  for  the  system,  then  the  index  6 would 
have  appeared  in  the  set  F,  and  subsequently  would  have  been  ignored  when  listing 
the  free  variables.  So  nothing  is  lost  by  not  creating  an  augmented  matrix  (in  the 
case  of  a homogenous  system).  And  maybe  it  is  an  improvement,  since  now  every 
index  in  F can  be  used  to  reference  a variable  of  the  linear  system. 

Step  1.  Write  the  vector  of  variables  (x)  as  a fixed  vector  (c),  plus  a linear 
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combination  of  n — r = 2 vectors  (ui,  U2),  using  the  free  variables  as  the  scalars. 


Step  2.  For  each  free  variable,  use  0’s  and  l’s  to  ensure  equality  for  the  corre- 
sponding entry  of  the  vectors.  Take  note  of  the  pattern  of  0’s  and  l’s  at  this  stage, 
even  if  it  is  not  as  illuminating  as  in  other  examples. 


Step  3.  For  each  dependent  variable,  use  the  augmented  matrix  to  formulate  an 
equation  expressing  the  dependent  variable  as  a constant  plus  multiples  of  the  free 
variables.  Do  not  forget  about  the  “missing”  sixth  column  being  full  of  zeros.  Convert 
this  equation  into  entries  of  the  vectors  that  ensure  equality  for  each  dependent 
variable,  one  at  a time. 


X3  = 0 — 2x4  + 1^5  =>  x = X3  = 0 + X4  — 2 + X5  1 

X4  0 1 0 

X5  0 0 1 


The  vector  c will  always  have  0’s  in  the  entries  corresponding  to  free  variables. 
However,  since  we  are  solving  a homogeneous  system,  the  row-reduced  augmented 
matrix  has  zeros  in  column  n + 1 = 6,  and  hence  all  the  entries  of  c are  zero.  So  we 
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can  write 


~x{ 

r-ii 

r 2 1 

r-ii 

r 2 1 

x2 

2 

-2 

2 

-2 

£3 

= 0 + 3:4 

-2 

+ X5 

1 

= X4 

-2 

+ X5 

1 

£4 

1 

0 

1 

0 

_x5_ 

0 

1 

0 

1 

It  will  always  happen  that  the  solutions  to  a homogeneous  system  has  c = 0 
(even  in  the  case  of  a unique  solution?).  So  our  expression  for  the  solutions  is  a 
bit  more  pleasing.  In  this  example  it  says  that  the  solutions  are  all  possible  linear 


combinations  of  the  two  vectors  Ui 


'-r 

' 2 ' 

2 

-2 

-2 

and  u2  = 

1 

1 

0 

0 

1 

with  no  mention  of 


any  fixed  vector  entering  into  the  linear  combination. 

This  observation  will  motivate  our  next  section  and  the  main  definition  of  that 
section,  and  after  that  we  will  conclude  the  section  by  formalizing  this  situation.  A 


Subsection  PSHS 

Particular  Solutions,  Homogeneous  Solutions 

The  next  theorem  tells  us  that  in  order  to  find  all  of  the  solutions  to  a linear  system 
of  equations,  it  is  sufficient  to  find  just  one  solution,  and  then  find  all  of  the  solutions 
to  the  corresponding  homogeneous  system.  This  explains  part  of  our  interest  in  the 
null  space,  the  set  of  all  solutions  to  a homogeneous  system. 

Theorem  PSPHS  Particular  Solution  Plus  Homogeneous  Solutions 

Suppose  that  w is  one  solution  to  the  linear  system  of  equations  CS(A,  b).  Then  y 

is  a solution  to  CS(A,  b)  if  and  only  if  y = w + z for  some  vector  z £ Af(A). 


Proof.  Let  Ai,  A2,  A3,  . . . , A„  be  the  columns  of  the  coefficient  matrix  A. 
(<=)  Suppose  y = w + z and  z £ A f(A).  Then 

b = [w]:  Ai 
= Hi  Ai 


= MiAi 


[w]2  A2 
[w]2  A2 


[w]3  A 3 + 
[w]3  A 3 + 
M3  A3  + 


+ Mi  Ai  + [z]2  A2  + [z]  3 A3 


Mn  An 
Mn  A™  ’ 
Mn  An 
+ fzL  A„ 


0 


Theorem  SLSLC 
Property  ZC 
Theorem  SLSLC 


= (Mi  + Mi)  Ai  + (M2  + Iz]2)  A2 

(Ms  + Ms)  A3  + • • • + (Mn  + Mn)  A™ 

= + [w  + z]1  Ax  + [w  + z]2  A2  d h [w  + z\, 

= [y]i  Ai  + [y]2  A2  + [y]3  a3  + ■ ■ • + [y]„  a„ 


Theorem  VSPCV 

Definition  CVA 
Definition  of  y 


Applying  Theorem  SLSLC  we  see  that  the  vector  y is  a solution  to  CS{A,  b). 
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(=>)  Suppose  y is  a solution  to  CS{A , b).  Then 

0 = b-b 

= [y]i  A1  + [y]2  A2  + [y]3  A3  H f [y]„  An  Theorem  SLSLC 

- (Mi  Ai  + [w]2  A2  + [w]3  A3  H h [w]n  An) 

= ([y]i  - Hj)  Ai  + ([y]2  - [w]2)  A2  Theorem  VSPCV 

+ ([y]3  - Ms)  A3  H h ([y]n  - [w]  J An 

= [y  — w]3  Ai  + [y  — w]2  A2  Definition  CVA 

+ [y  - w]3  A3  + • • • + [y  - w]n  An 

By  Theorem  SLSLC  we  see  that  the  vector  y — w is  a solution  to  the  homogeneous 
system  £<S(A,  0)  and  by  Definition  NSM,  y — w £ Af(A).  In  other  words,  y — w = z 
for  some  vector  z £ A?(A).  Rewritten,  this  is  y = w + z,  as  desired.  ■ 

After  proving  Theorem  NMUS  we  commented  (insufficiently)  on  the  negation  of 
one  half  of  the  theorem.  Nonsingular  coefficient  matrices  lead  to  unique  solutions  for 
every  choice  of  the  vector  of  constants.  What  does  this  say  about  singular  matrices? 
A singular  matrix  A has  a nontrivial  null  space  (Theorem  NMTNS).  For  a given 
vector  of  constants,  b,  the  system  CS(A1  b)  could  be  inconsistent,  meaning  there 
are  no  solutions.  But  if  there  is  at  least  one  solution  (w),  then  Theorem  PSPHS  tells 
us  there  will  be  infinitely  many  solutions  because  of  the  role  of  the  infinite  null  space 
for  a singular  matrix.  So  a system  of  equations  with  a singular  coefficient  matrix 
never  has  a unique  solution.  Notice  that  this  is  the  contrapositive  of  the  statement 
in  Exercise  NM.T31.  With  a singular  coefficient  matrix,  either  there  are  no  solutions, 
or  infinitely  many  solutions,  depending  on  the  choice  of  the  vector  of  constants  (b). 

Example  PSHS  Particular  solutions,  homogeneous  solutions,  Archetype  D 
Archetype  D is  a consistent  system  of  equations  with  a nontrivial  null  space.  Let 
A denote  the  coefficient  matrix  of  this  system.  The  write-up  for  this  system  begins 
with  three  solutions, 


'O' 

'4' 

T 

1 

0 

8 

2 

y2  = 

0 

Y3  = 

1 

1 

0 

3 

We  will  choose  to  have  y3  play  the  role  of  w in  the  statement  of  Theorem  PSPHS, 
any  one  of  the  three  vectors  listed  here  (or  others)  could  have  been  chosen.  To 
illustrate  the  theorem,  we  should  be  able  to  write  each  of  these  three  solutions  as 
the  vector  w plus  a solution  to  the  corresponding  homogeneous  system  of  equations. 
Since  0 is  always  a solution  to  a homogeneous  system  we  can  easily  write 

yi  = w = w + 0. 

The  vectors  y2  and  y3  will  require  a bit  more  effort.  Solutions  to  the  homogeneous 
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system  CS{A,  0)  are  exactly  the  elements  of  the  null  space  of  the  coefficient  matrix, 
which  by  an  application  of  Theorem  VFSLS  is 


Then 


A f(A)  = 


r 

—3' 

'2' 

) 

£3 

-1 

1 

+ X4 

3 

0 

X3,  X4  £ C 

l 

0 

1 

J 

r4i 

roi 

r 4 1 

roi 

/ 

r-3i 

r2i 

0 

1 

-1 

1 

-2) 

-1 

+ (-i) 

3 

y2  = 

0 

- 

2 

+ 

-2 

- 

2 

+ 

( 

1 

0 

0 

1 

-1 

1 

0 

1 

where 


Z2  = 

■ 4 ■ 
-1 
—2 

= (-2) 

‘—3" 

-1 

1 

+ (-l) 

'2' 

3 

0 

-1 

0 

1 

= w + z2 


is  obviously  a solution  of  the  homogeneous  system  since  it  is  written  as  a linear 
combination  of  the  vectors  describing  the  null  space  of  the  coefficient  matrix  (or  as 
a check,  you  could  just  evaluate  the  equations  in  the  homogeneous  system  with  z2). 
Again 


y3  = 


r7i 

r°1 

r 7 1 

roi 

( 

r-3i 

rzi 

8 

1 

7 

1 

(-1) 

-1 

+2 

3 

1 

— 

2 

+ 

-1 

— 

2 

+ 

1 

0 

3 

1 

2 

1 

0 

1 

= w + z3 


where 


' 7 ' 

'-3' 

'2' 

7 

= (-i) 

-1 

+ 2 

3 

-1 

1 

0 

2 

0 

1 

is  obviously  a solution  of  the  homogeneous  system  since  it  is  written  as  a linear 
combination  of  the  vectors  describing  the  null  space  of  the  coefficient  matrix  (or  as 
a check,  you  could  just  evaluate  the  equations  in  the  homogeneous  system  with  z2). 

Here  is  another  view  of  this  theorem,  in  the  context  of  this  example.  Grab  two 
new  solutions  of  the  original  system  of  equations,  say 


'll' 

'-4' 

0 

2 

y4  = 

-3 

ys  = 

4 

-1 

2 
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and  form  their  difference, 


'll' 

'-4' 

'15' 

0 

2 

-2 

-3 

4 

-7 

-1 

2 

-3 

u = 


It  is  no  accident  that  u is  a solution  to  the  homogeneous  system  (check  this!).  In 
other  words,  the  difference  between  any  two  solutions  to  a linear  system  of  equations 
is  an  element  of  the  null  space  of  the  coefficient  matrix.  This  is  an  equivalent  way  to 
state  Theorem  PSPHS.  (See  Exercise  MM.T50).  A 

The  ideas  of  this  subsection  will  appear  again  in  Chapter  LT  when  we  discuss 
pre- images  of  linear  transformations  (Definition  PI). 


Reading  Questions 


1.  Earlier,  a reading  question  asked  you  to  solve  the  system  of  equations 

2xi  + 3X2  —*3  = 0 
Xl  + 2X2  +*3=3 
*i  + 3*2  + 3*3  = 7 


Use  a linear  combination  to  rewrite  this  system  of  equations  as  a vector  equality. 


2.  Find  a linear  combination  of  the  vectors 


2 

0 , 
4 


-1 

3 

-5 


that  equals  the  vector 


1 

-9  . 
11 


3.  The  matrix  below  is  the  augmented  matrix  of  a system  of  equations,  row-reduced  to 
reduced  row-echelon  form.  Write  the  vector  form  of  the  solutions  to  the  system. 


’[T]  3 0 6 0 9 ’ 

0 0 0-2  0 -8 

0 0 0 0 0 3 


Exercises 

C21'  Consider  each  archetype  that  is  a system  of  equations.  For  individual  solutions 
listed  (both  for  the  original  system  and  the  corresponding  homogeneous  system)  express 
the  vector  of  constants  as  a linear  combination  of  the  columns  of  the  coefficient  matrix,  as 
guaranteed  by  Theorem  SLSLC.  Verify  this  equality  by  computing  the  linear  combination. 
For  systems  with  no  solutions,  recognize  that  it  is  then  impossible  to  write  the  vector  of 
constants  as  a linear  combination  of  the  columns  of  the  coefficient  matrix.  Note  too,  for 
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homogeneous  systems,  that  the  solutions  give  rise  to  linear  combinations  that  equal  the 
zero  vector. 

Archetype  A,  Archetype  B,  Archetype  C,  Archetype  D,  Archetype  E,  Archetype  F,  Archetype 
G,  Archetype  H,  Archetype  I,  Archetype  J 

C22 ' Consider  each  archetype  that  is  a system  of  equations.  Write  elements  of  the  solution 
set  in  vector  form,  as  guaranteed  by  Theorem  VFSLS. 


Archetype  A,  Archetype  B,  Archetype  C,  Archetype  D,  Archetype  E,  Archetype  F,  Archetype 
G,  Archetype  H,  Archetype  I,  Archetype  J 

C40'  Find  the  vector  form  of  the  solutions  to  the  system  of  equations  below. 


C41f 


2*1  — 4*2  + 3*3  + *5  = 

6 

*1  — 2*2  — 2*3  + 14*4  — 4*5  = 

15 

*1  - 2*2  + *3  + 2*4  + *5  = 

-1 

— 2*1  + 4*2  — 12*4  + *5  = 

-7 

the  vector  form  of  the  solutions  to  the  system  of  equations 

below. 

—2xi 

— 1*2  — 8*3  + 8*4  + 4*5  — 9*6  — 1*7  — 

lxs  - 

18*9  = 

3 

3*i 

— 2*2  + 5*3  + 2*4  — 2*5  — 5*6  + 1*7  + 

2x8  + 

15*9  = 

10 

4*i  — 2*2  + 8*3  + 2*5  — 14*6  - 

- 2*8  + 2*9  = 

36 

— 1*1  + 2*2  + 1*3  — 6*4  + 7*6  - 

- 1*7  - 

- 3*9  = 

-8 

3*1  + 2*2  + 13*3  — 14*4  — 1*5  + 5*6  — 

1*8  + 

12*9  = 

15 

— 2a:i 

+ 2*2  — 2*3  — 4*4  + 1*5  + 6*6  — 2*7  — 

2*8  - 

15*9  = 

-7 

M10'  Example  TLC  asks  if  the  vector 

' 13  ' 

15 
5 

-17 
2 

25 

can  be  written  as  a linear  combination  of  the  four  vectors 


Ul  = 


Can  it?  Can  any  vector  in  C6  be  written  as  a linear  combination  of  the  four  vectors 

Ul,  U2,  U3,  U4? 

MU'  At  the  end  of  Example  VFS,  the  vector  w is  claimed  to  be  a solution  to  the  linear 
system  under  discussion.  Verify  that  w really  is  a solution.  Then  determine  the  four  scalars 


' 2 ' 

' 6 ' 

'-5' 

' 3 ' 

4 

3 

2 

2 

-3 

0 

1 

-5 

1 

U2  = 

-2 

U3  = 

1 

u4  = 

7 

2 

1 

-3 

1 

9 

4 

0 

3 
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that  express  w as  a linear  combination  of  c,  ui,  U2,  U3. 


Section  SS 
Spanning  Sets 

In  this  section  we  will  provide  an  extremely  compact  way  to  describe  an  infinite  set 
of  vectors,  making  use  of  linear  combinations.  This  will  give  us  a convenient  way  to 
describe  the  solution  set  of  a linear  system,  the  null  space  of  a matrix,  and  many 
other  sets  of  vectors. 


Subsection  SSV 
Span  of  a Set  of  Vectors 

In  Example  VFSAL  we  saw  the  solution  set  of  a homogeneous  system  described 
as  all  possible  linear  combinations  of  two  particular  vectors.  This  is  a useful  way 
to  construct  or  describe  infinite  sets  of  vectors,  so  we  encapsulate  the  idea  in  a 
definition. 


Definition  SSCV  Span  of  a Set  of  Column  Vectors 

Given  a set  of  vectors  S = {ui,  u2,  u3,  . . . , up},  their  span,  ( S ),  is  the  set  of  all 
possible  linear  combinations  of  ui,  u2,  u3,  . . . , up.  Symbolically, 


(S)  — { a. iUj  + a2u2  + a3u3  + •••-)-  apUp|  € C,  1 < i < p} 

( v 


= < 


on  £ C,  1 < i < p 


□ 


The  span  is  just  a set  of  vectors,  though  in  all  but  one  situation  it  is  an  infinite 
set.  (Just  when  is  it  not  infinite?)  So  we  start  with  a finite  collection  of  vectors  S ( p 
of  them  to  be  precise),  and  use  this  finite  set  to  describe  an  infinite  set  of  vectors, 
(S) . Confusing  the  finite  set  S with  the  infinite  set  (S)  is  one  of  the  most  persistent 
problems  in  understanding  introductory  linear  algebra.  We  will  see  this  construction 
repeatedly,  so  let  us  work  through  some  examples  to  get  comfortable  with  it.  The 
most  obvious  question  about  a set  is  if  a particular  item  of  the  correct  type  is  in  the 
set,  or  not  in  the  set. 


Example  ABS  A basic  span 
Consider  the  set  of  5 vectors,  S,  from  C4 


and  consider  the  infinite  set  of  vectors  (S)  formed  from  all  possible  linear  combinations 
of  the  elements  of  S.  Here  are  four  vectors  we  definitely  know  are  elements  of  (S), 
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since  we  will  construct  them  in  accordance  with  Definition  SSCV, 


r 

■ 2 ' 

' 7 ' 

' 1 ' 

r 

'-4' 

w = (2) 

1 

3 

+ (i) 

1 

2 

+ (-i) 

3 

5 

+ (2) 

1 

-1 

+ (3) 

0 

9 

= 

2 

28 

1 

-1 

-5 

_ 2 _ 

. 0 . 

10 

T 

■ 2 ' 

' 7 ' 

' 1 ' 

■-r 

'-26' 

x = (5) 

1 

3 

-K-6) 

1 

2 

+ (—3) 

3 

5 

+ (4) 

1 

-1 

+ (2) 

0 

9 

= 

-6 

2 

1 

-1 

-5 

2 

0 

34 

I'll 

r 2 1 

r 7 1 

r 1 1 

r-11 

r 7 1 

y = (!) 

1 

3 

+ (0) 

1 

2 

+ (i) 

3 

5 

+ (0) 

1 

-1 

+ (i) 

0 

9 

= 

4 

17 

1 

-1 

-5 

2 

0 

-4 

z = (0) 

T 

1 

3 

+ (0) 

' 2 ' 
1 
2 

+ (0) 

' 7 ' 

3 

5 

+ (0) 

' 1 ' 
1 

-1 

+ (0) 

■-r 

0 

9 

= 

'O' 

0 

0 

1 

-1 

-5 

2 

0 

0 

The  purpose  of  a set  is  to  collect  objects  with  some  common  property,  and  to 
exclude  objects  without  that  property.  So  the  most  fundamental  question  about  a 
set  is  if  a given  object  is  an  element  of  the  set  or  not.  Let  us  learn  more  about  (S) 
by  investigating  which  vectors  are  elements  of  the  set,  and  which  are  not. 


First,  is  u = 


'-15' 

-6 

19 

5 


an  element  of  (S)?  We  are  asking  if  there  are  scalars 


Oil , OL 2,  Qi3,  Qq,  CI5  such  that 


r1] 

r 2 1 

r 7 1 

r 1 1 

r— in 

r-151 

1 

+ a2 

1 

+ a3 

3 

+ OL4 

1 

+ C*5 

0 

-6 

Oil 

3 

2 

5 

-1 

9 

= U = 

19 

1 

-1 

-5 

2 

0 

5 

Applying  Theorem  SLSLC  we  recognize  the  search  for  these  scalars  as  a solution 


to  a linear  system  of  equations  with  augmented  matrix 


T 

2 

7 

1 

-1 

-15 

1 

1 

3 

1 

0 

-6 

3 

2 

5 

-1 

9 

19 

1 

-1 

-5 

2 

0 

5 
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which  row-reduces  to 

'0  0-10  3 10" 

0 0 4 0-1-9 

0 0 0 0 -2  -7 

.0  0 0 0 0 0 . 

At  this  point,  we  see  that  the  system  is  consistent  (Theorem  RCLS),  so  we  know 
there  is  a solution  for  the  five  scalars  a±,  a.?.,  <23,  <24,  a. 5 . This  is  enough  evidence  for 
us  to  say  that  u G (S).  If  we  wished  further  evidence,  we  could  compute  an  actual 
solution,  say 


<21  = 2 a.2  = 1 a 3 = — 2 <24  = — 3 a.  5 = 2 

This  particular  solution  allows  us  to  write 


(2) 

T 

1 

3 

+ (1) 

' 2 ' 
1 
2 

+ (-2) 

' 7 ' 

3 

5 

+ (-3) 

' 1 ' 
1 

-1 

+ (2) 

■-r 

0 

9 

= U = 

'—15' 

-6 

19 

1 

-1 

-5 

2 

0 

5 

making  it  even  more  obvious  that  u G (S). 

3 


Let  us  do  it  again.  Is  v = 


1 

2 

-1 


an  element  of  (S)?  We  are  asking  if  there  are 


scalars  aq,  a 2,  03,  <24,  05  such  that 


IT] 

r 2 ] 

r 7 ] 

r 1 ] 

r-ri 

r 3 1 

1 

+ <22 

1 

+ (23 

3 

+ 0^4 

1 

+ <25 

0 

1 

<2l 

3 

2 

5 

-1 

9 

= V = 

2 

1 

-1 

-5 

2 

0 

-1 

Applying  Theorem  SLSLC  we  recognize  the  search  for  these  scalars  as  a solution 
to  a linear  system  of  equations  with  augmented  matrix 

T 2 7 1-13' 

113  10  1 

3 2 5 -1  9 2 

_1  -1  -5  2 0 -1 

which  row-reduces  to 

0 0 -1  0 3 O' 

0 0 4 0 -1  0 

0 0 0 0 -2  0 

. 0 0 0 0 0 0 

At  this  point,  we  see  that  the  system  is  inconsistent  by  Theorem  RCLS,  so  we 
know  there  is  not  a solution  for  the  five  scalars  op,  <22,  <23,  <24,  <25.  This  is  enough 
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evidence  for  us  to  say  that  v ^ (S) . End  of  story. 


A 


Example  SCAA  Span  of  the  columns  of  Archetype  A 
Begin  with  the  finite  set  of  three  vectors  of  size  3 


S = {ui,  u2,  u3}  = 


and  consider  the  infinite  set  (S).  The  vectors  of  S could  have  been  chosen  to  be 
anything,  but  for  reasons  that  will  become  clear  later,  we  have  chosen  the  three 
columns  of  the  coefficient  matrix  in  Archetype  A. 

First,  as  an  example,  note  that 


T 

-r 

'2' 

-22" 

V = (5) 

2 

.1 

+ (—3) 

i 

. i . 

+ (7) 

1 

0 

14 

2 

is  in  ( S ),  since  it  is  a linear  combination  of  u3,  u2,  u3.  We  write  this  succinctly  as 
v € ( S ).  There  is  nothing  magical  about  the  scalars  aq  = 5,  a2  = —3,  a3  = 7,  they 
could  have  been  chosen  to  be  anything.  So  repeat  this  part  of  the  example  yourself, 
using  different  values  of  a i,  a2,  a3.  What  happens  if  you  choose  all  three  scalars  to 
be  zero? 

So  we  know  how  to  quickly  construct  sample  elements  of  the  set  ( S ) . A slightly 
different  question  arises  when  you  are  handed  a vector  of  the  correct  size  and  asked 


if  it  is  an  element  of  ( S ).  For  example,  is  w = 


T 

8 

5 


in  (S}1  More  succinctly,  w S (5)? 


To  answer  this  question,  we  will  look  for  scalars  aq,  a2,  a3  so  that 


aqui  + a2u2  + a3u3  = w 


By  Theorem  SLSLC  solutions  to  this  vector  equation  are  solutions  to  the  system  of 
equations 


cni  — ot2  + 2 a3  = 1 
2 Qq  T 0-2  T — 8 
«i  + «2  = 5 


Building  the  augmented  matrix  for  this  linear  system,  and  row-reducing,  gives 

U]  0 1 3' 

0 0 0 0 


This  system  has  infinitely  many  solutions  (there  is  a free  variable  in  a;3),  but  all 
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we  need  is  one  solution  vector.  The  solution, 

ai  = 2 0C2  — 3 a3  = 1 

tells  us  that 

(2)ui  + (3)u2  + (l)u3  = w 

so  we  are  convinced  that  w really  is  in  (S).  Notice  that  there  are  an  infinite  number 
of  ways  to  answer  this  question  affirmatively.  We  could  choose  a different  solution, 
this  time  choosing  the  free  variable  to  be  zero, 

oc\  = 3 o?2  = 2 03  = 0 

shows  us  that 


(3)ui  + (2)u2  + (0)u3  = w 


Verifying  the  arithmetic  in  this  second  solution  will  make  it  obvious  that  w is  in 
this  span.  And  of  course,  we  now  realize  that  there  are  an  infinite  number  of  ways 
to  realize  w as  element  of  (S). 


Let  us  ask  the  same  type  of  question  again,  but  this  time  with  y 


'2‘ 

4 

3 


i.e.  is 


y e <s>? 

So  we  will  look  for  scalars  oq,  a2,  a3  so  that 


oqui  + a2u2  + a3u3  = y 


By  Theorem  SLSLC  solutions  to  this  vector  equation  are  the  solutions  to  the  system 
of  equations 


q.  1 — a2  + 2a3  — 2 
2 ctq  + a2  + a3  = 4 
a 1 + a2  = 3 


Building  the  augmented  matrix  for  this  linear  system,  and  row-reducing,  gives 

0 0 1 o' 

0 0-10 

. 0 0 0 0 

This  system  is  inconsistent  (there  is  a pivot  column  in  the  last  column,  Theorem 
RCLS),  so  there  are  no  scalars  aq,  a2,  a3  that  will  create  a linear  combination  of 
u1;  u2,  u3  that  equals  y.  More  precisely,  y ^ ( S ). 

There  are  three  things  to  observe  in  this  example.  (1)  It  is  easy  to  construct 
vectors  in  ( S ).  (2)  It  is  possible  that  some  vectors  are  in  ( S ) (e.g.  w),  while  others 
are  not  (e.g.  y).  (3)  Deciding  if  a given  vector  is  in  ( S ) leads  to  solving  a linear 
system  of  equations  and  asking  if  the  system  is  consistent. 
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With  a computer  program  in  hand  to  solve  systems  of  linear  equations,  could 
you  create  a program  to  decide  if  a vector  was,  or  was  not,  in  the  span  of  a given  set 
of  vectors?  Is  this  art  or  science? 

This  example  was  built  on  vectors  from  the  columns  of  the  coefficient  matrix  of 
Archetype  A.  Study  the  determination  that  v £ (S)  and  see  if  you  can  connect  it 
with  some  of  the  other  properties  of  Archetype  A.  A 

Having  analyzed  Archetype  A in  Example  SCAA,  we  will  of  course  subject 
Archetype  B to  a similar  investigation. 


Example  SCAB  Span  of  the  columns  of  Archetype  B 

Begin  with  the  finite  set  of  three  vectors  of  size  3 that  are  the  columns  of  the 
coefficient  matrix  in  Archetype  B, 


R = {vi,  v2,  v3} 


and  consider  the  infinite  set  (R). 
First,  as  an  example,  note  that 


--T 

"-6" 

"—12" 

"-2" 

x=  (2) 

5 

1 

+ (4) 

5 

0 

+ (—3) 

7 

4 

— 

9 

-10 

is  in  (R),  since  it  is  a linear  combination  of  Vi,  v2,  v3.  In  other  words,  x £ (R).  Try 
some  different  values  of  a\,  a2,  ot 3 yourself,  and  see  what  vectors  you  can  create  as 
elements  of  (R). 


Now  ask  if  a given  vector  is  an  element  of  (R).  For  example,  is  z 


"-33" 

24 

5 


in 


(R)7  Is  z £ (R}7 

To  answer  this  question,  we  will  look  for  scalars  op,  a2,  a3  so  that 


aqv  1 + a2v2  + a3v3  = z 


By  Theorem  SLSLC  solutions  to  this  vector  equation  are  the  solutions  to  the  system 
of  equations 


— 7ai  — 6a2  — 12a3  = —33 
5«i  + 5a2  + 7a3  = 24 
oq  + 4a3  = 5 


Building  the  augmented  matrix  for  this  linear  system,  and  row-reducing,  gives 

'0  0 0 -3' 

0 0 0 5 

.0  0 0 2. 
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This  system  has  a unique  solution, 

oq  = — 3 a 2 = 5 o?3  = 2 

telling  us  that 

(— 3)vi  + (5)v2  + (2)v3  = z 

so  we  are  convinced  that  z really  is  in  (R).  Notice  that  in  this  case  we  have  only 
one  way  to  answer  the  question  affirmatively  since  the  solution  is  unique. 


Let  us  ask  about  another  vector,  say  is  x = 
We  desire  scalars  ctq,  ct2,  a$  so  that 


—7 

8 

-3 


in  (R)7  Is  x G (R}7 


oqvi  + a2v2  + CI3V3  = x 


By  Theorem  SLSLC  solutions  to  this  vector  equation  are  the  solutions  to  the  system 
of  equations 


— 7oq  — 6a2  — 12«3  = —7 
5oq  + 5a2  + la 3 = 8 
aq  + 4c*3  = —3 

Building  the  augmented  matrix  for  this  linear  system,  and  row-reducing,  gives 

"0  0 0 1 ' 

0 0 0 2 
.0  0 0 -1. 

This  system  has  a unique  solution, 

oq  = 1 ci2  = 2 CI3  = — 1 

telling  us  that 

(l)vi  + (2)v2  + ( 1)  v3  = x 

so  we  are  convinced  that  x really  is  in  ( R ).  Notice  that  in  this  case  we  again  have 
only  one  way  to  answer  the  question  affirmatively  since  the  solution  is  again  unique. 

We  could  continue  to  test  other  vectors  for  membership  in  (i?),  but  there  is  no 
point.  A question  about  membership  in  (R)  inevitably  leads  to  a system  of  three 
equations  in  the  three  variables  oq,  a2,  a%  with  a coefficient  matrix  whose  columns 
are  the  vectors  vi,  v2,  v3.  This  particular  coefficient  matrix  is  nonsingular,  so  by 
Theorem  NMUS,  the  system  is  guaranteed  to  have  a solution.  (This  solution  is 
unique,  but  that  is  not  critical  here.)  So  no  matter  which  vector  we  might  have 
chosen  for  z,  we  would  have  been  certain  to  discover  that  it  was  an  element  of  (R). 
Stated  differently,  every  vector  of  size  3 is  in  (R),  or  (R)  = C3. 

Compare  this  example  with  Example  SCAA,  and  see  if  you  can  connect  z with 


§ss 


Beezer:  A First  Course  in  Linear  Algebra 


114 


some  aspects  of  the  write-up  for  Archetype  B. 


A 


Subsection  SSNS 
Spanning  Sets  of  Null  Spaces 


We  saw  in  Example  VFSAL  that  when  a system  of  equations  is  homogeneous  the 
solution  set  can  be  expressed  in  the  form  described  by  Theorem  VFSLS  where  the 
vector  c is  the  zero  vector.  We  can  essentially  ignore  this  vector,  so  that  the  remainder 
of  the  typical  expression  for  a solution  looks  like  an  arbitrary  linear  combination, 
where  the  scalars  are  the  free  variables  and  the  vectors  are  ui,  112,  113,  . . . , ura_r. 
Which  sounds  a lot  like  a span.  This  is  the  substance  of  the  next  theorem. 

Theorem  SSNS  Spanning  Sets  for  Null  Spaces 

Suppose  that  A is  an  m x n matrix,  and  B is  a row- equivalent  matrix  in  re- 
duced row-echelon  form.  Suppose  that  B has  r pivot  columns,  with  indices  given 
by  D = {di,  d2,  ^3,  . . . , dr},  while  the  n — r non-pivot  columns  have  indices  F = 
{/1,  f-2 , /3,  . . . , fn-r,  n + 1} . Construct  the  n — r vectors  z j,  1 < j < n — r of  size 
n, 


Proof.  Consider  the  homogeneous  system  with  A as  a coefficient  matrix,  £S{A,  0). 
Its  set  of  solutions,  S,  is  by  Definition  NSM,  the  null  space  of  A , Af(A).  Let  B' 
denote  the  result  of  row-reducing  the  augmented  matrix  of  this  homogeneous  system. 
Since  the  system  is  homogeneous,  the  final  column  of  the  augmented  matrix  will  be 
all  zeros,  and  after  any  number  of  row  operations  (Definition  RO),  the  column  will 
still  be  all  zeros.  So  B'  has  a final  column  that  is  totally  zeros. 

Now  apply  Theorem  VFSLS  to  B' , after  noting  that  our  homogeneous  system 
must  be  consistent  (Theorem  HSC).  The  vector  c has  zeros  for  each  entry  that  has 
an  index  in  F . For  entries  with  their  index  in  D , the  value  is  — [B']k  „+1,  but  for 
B'  any  entry  in  the  final  column  (index  n + 1)  is  zero.  So  c = 0.  The  vectors  z ? , 
1 < j < n — r are  identical  to  the  vectors  Uj , 1 < j < n — r described  in  Theorem 
VFSLS.  Putting  it  all  together  and  applying  Definition  SSCV  in  the  final  step, 


= {c  -f-  aiUi  + <32 U2  + CX3U3  + ■ ■ ■ + <3n_ru„_,,|  aq,  0.2 , <33,  . . . , an-r  € C} 
= {<3lUl  + C*2U2  + CX3U3  + ■ • • + an-rUn_r\  <31,  <32,  <33,  . . . , <3ra_r  € C} 

= ({zli  z2i  z3i  • • • j zn— r}} 


[B]kjj  if  i £ D , i = dk 


ifi£F,i  = fj 
if  i £ F , i fj 


Then  the  null  space  of  A is  given  by 


A f(A)  = {{xi,  z2,  z3,  . . . , zn_,.}) 


U{A)  = S 
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Notice  that  the  hypotheses  of  Theorem  VFSLS  and  Theorem  SSNS  are  slightly 
different.  In  the  former,  B is  the  row-reduced  version  of  an  augmented  matrix  of 
a linear  system,  while  in  the  latter,  B is  the  row-reduced  version  of  an  arbitrary 
matrix.  Understanding  this  subtlety  now  will  avoid  confusion  later. 

Example  SSNS  Spanning  set  of  a null  space 

Find  a set  of  vectors,  S,  so  that  the  null  space  of  the  matrix  A below  is  the  span  of 
S,  that  is,  (S)  =N(A). 

‘ 1 3 3-1-5' 

. _ 2 5 7 1 1 

11515 
_-l  -4  -2  0 4 _ 

The  null  space  of  A is  the  set  of  all  solutions  to  the  homogeneous  system  £S(A,  0). 
If  we  find  the  vector  form  of  the  solutions  to  this  homogeneous  system  (Theorem 
VFSLS)  then  the  vectors  Uj , 1 < j < n — r in  the  linear  combination  are  exactly  the 
vectors  Zj,  1 < j < n — r described  in  Theorem  SSNS.  So  we  can  mimic  Example 
VFSAL  to  arrive  at  these  vectors  (rather  than  being  a slave  to  the  formulas  in  the 
statement  of  the  theorem). 


Begin  by  row-reducing  A. 

The 

result 

is 

r0 

0 

6 

0 

4 ' 

0 

0 

-1 

0 

-2 

0 

0 

0 

0 

3 

. 0 

0 

0 

0 

0 . 

With  D = {1,  2,  4}  and  F = {3,  5}  we  recognize  that  £3  and  £5  are  free  variables 
and  we  can  interpret  each  nonzero  row  as  an  expression  for  the  dependent  variables 
x\,  X2,  £4  (respectively)  in  the  free  variables  £3  and  £5.  With  this  we  can  write  the 
vector  form  of  a solution  vector  as 


£1" 

’-6x3  - 4£5" 

'-6' 

'-4' 

£2 

£3  + 2£5 

1 

2 

= 

X3 

= X3 

1 

+ £5 

0 

£4 

-3£5 

0 

-3 

£5. 

£5 

0 

1 

Then  in  the  notation  of  Theorem  SSNS, 


'-6' 

'-4' 

1 

2 

1 

z2  = 

0 

0 

-3 

0 

1 
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and 


A f(A)  = ({z1}  z2}) 


A 


Example  NSDS  Null  space  directly  as  a span 

Let  us  express  the  null  space  of  A as  the  span  of  a set  of  vectors,  applying  Theorem 
SSNS  as  economically  as  possible,  without  reference  to  the  underlying  homogeneous 
system  of  equations  (in  contrast  to  Example  SSNS). 


' 2 

1 

5 

1 

5 

1 ' 

1 

1 

3 

1 

6 

-1 

-1 

1 

-1 

0 

4 

—3 

-3 

2 

—4 

—4 

-7 

0 

3 

-1 

5 

2 

2 

3 

Theorem  SSNS  creates  vectors  for  the  span  by  first  row-reducing  the  matrix  in 


question.  The  row-reduced  version 

of  A 

is 

B 

0 

2 

0 

-1 

2 ' 

0 

0 

1 

0 

3 

-1 

B = 

0 

0 

0 

0 

4 

-2 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

We  will  mechanically  follow  the  prescription  of  Theorem  SSNS.  Here  we  go,  in 
two  big  steps. 

First,  the  non-pivot  columns  have  indices  F = {3,  5,  6},  so  we  will  construct  the 
n — r = 6 — 3 = 3 vectors  with  a pattern  of  zeros  and  ones  dictated  by  the  indices 
in  F.  This  is  the  realization  of  the  first  two  lines  of  the  three-case  definition  of  the 
vectors  z;- , 1 < j < n — r. 


1 

0 

0 

Z2  = 

Z3  = 

0 

1 

0 

0 

0 

1 

Each  of  these  vectors  arises  due  to  the  presence  of  a column  that  is  not  a pivot 
column.  The  remaining  entries  of  each  vector  are  the  entries  of  the  non-pivot  column, 
negated,  and  distributed  into  the  empty  slots  in  order  (these  slots  have  indices  in 
the  set  D , so  also  refer  to  pivot  columns).  This  is  the  realization  of  the  third  line  of 
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the  three-case  definition  of  the  vectors  Zj,  1 < j < n — r. 


'-2' 

' 1 ' 

'-2' 

-1 

-3 

1 

1 

0 

0 

0 

Z2  = 

—4 

z3  = 

2 

0 

1 

0 

0 

0 

1 

So,  by  Theorem  SSNS,  we  have 


7V(A)  = ({zi,  z2,  z3}) 


'-2' 

' 1 ' 

'-2' 

-1 

-3 

1 

1 

0 

0 

0 

5 

-4 

5 

2 

0 

1 

0 

< 

0 

0 

1 

> 

We  know  that  the  null  space  of  A is  the  solution  set  of  the  homogeneous  system 
CS{A,  0),  but  nowhere  in  this  application  of  Theorem  SSNS  have  we  found  occasion 
to  reference  the  variables  or  equations  of  this  system.  These  details  are  all  buried  in 
the  proof  of  Theorem  SSNS.  A 


Here  is  an  example  that  will  simultaneously  exercise  the  span  construction  and 
Theorem  SSNS,  while  also  pointing  the  way  to  the  next  section. 


Example  SCAD  Span  of  the  columns  of  Archetype  D 
Begin  with  the  set  of  four  vectors  of  size  3 


T = {wb  w2,  w3,  w4} 


and  consider  the  infinite  set  W = ( T ) . The  vectors  of  T have  been  chosen  as  the 
four  columns  of  the  coefficient  matrix  in  Archetype  D.  Check  that  the  vector 


Z2  = 


"2" 

3 

0 

1 


is  a solution  to  the  homogeneous  system  CS(D , 0)  (it  is  the  vector  z2  provided  by 
the  description  of  the  null  space  of  the  coefficient  matrix  D from  Theorem  SSNS). 
Applying  Theorem  SLSLC,  we  can  write  the  linear  combination, 


2wi  + 3w2  + 0w3  + lw4  = 0 


which  we  can  solve  for  w4, 

w4  = (— 2)w4  + (-3)w2. 

This  equation  says  that  whenever  we  encounter  the  vector  w4,  we  can  replace  it 
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with  a specific  linear  combination  of  the  vectors  wi  and  W2.  So  using  W4  in  the  set 
T,  along  with  wi  and  W2,  is  excessive.  An  example  of  what  we  mean  here  can  be 
illustrated  by  the  computation, 

5w4  + (— 4)w2  + 6w3  + (— 3)w4 

= 5wi  + (— 4)w2  + 6w3  + (-3)  ((— 2)wi  + (— 3)w2) 

= 5wi  + (— 4)w2  + 6w3  + (6wi  + 9w2) 

= llwi  + 5w2  + 6w3 


So  what  began  as  a linear  combination  of  the  vectors  w4,  w2,  w3,  w4  has  been 
reduced  to  a linear  combination  of  the  vectors  w4,  w2,  w3.  A careful  proof  using 
our  definition  of  set  equality  (Definition  SE)  would  now  allow  us  to  conclude  that 
this  reduction  is  possible  for  any  vector  in  W,  so 

W = ({w4,  w2,  w3 } ) 


So  the  span  of  our  set  of  vectors,  W,  has  not  changed,  but  we  have  described  it  by 
the  span  of  a set  of  three  vectors,  rather  than  four.  Furthermore,  we  can  achieve  yet 
another,  similar,  reduction. 

Check  that  the  vector 


Zl 


'-3' 

-1 

1 

0 


is  a solution  to  the  homogeneous  system  CS(D , 0)  (it  is  the  vector  z4  provided  by 
the  description  of  the  null  space  of  the  coefficient  matrix  D from  Theorem  SSNS). 
Applying  Theorem  SLSLC,  we  can  write  the  linear  combination, 

(— 3)w4  + (— l)w2  + lw3  = 0 


which  we  can  solve  for  w3, 


w3  = 3w4  + lw2 

This  equation  says  that  whenever  we  encounter  the  vector  w3,  we  can  replace  it 
with  a specific  linear  combination  of  the  vectors  wy  and  w2.  So,  as  before,  the  vector 
w3  is  not  needed  in  the  description  of  W,  provided  we  have  wy  and  w2  available.  In 
particular,  a careful  proof  (such  as  is  done  in  Example  RSC5)  would  show  that 

W = ({wy,  w2}) 

So  W began  life  as  the  span  of  a set  of  four  vectors,  and  we  have  now  shown 
(utilizing  solutions  to  a homogeneous  system)  that  W can  also  be  described  as  the 
span  of  a set  of  just  two  vectors.  Convince  yourself  that  we  cannot  go  any  further. 
In  other  words,  it  is  not  possible  to  dismiss  either  w4  or  w2  in  a similar  fashion  and 
winnow  the  set  down  to  just  one  vector. 

What  was  it  about  the  original  set  of  four  vectors  that  allowed  us  to  declare 
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certain  vectors  as  surplus?  And  just  which  vectors  were  we  able  to  dismiss?  And 
why  did  we  have  to  stop  once  we  had  two  vectors  remaining?  The  answers  to  these 
questions  motivate  “linear  independence,”  our  next  section  and  next  definition,  and 
so  are  worth  considering  carefully  now.  A 

Reading  Questions 


1.  Let  S be  the  set  of  three  vectors  below. 


Let  W = (S)  be  the  span  of  S.  Is  the  vector 
reason  for  your  answer. 


in  W?  Give  an  explanation  of  the 


2.  Use  S and  W from  the  previous  question.  Is  the  vector 
of  the  reason  for  your  answer. 


in  W1  Give  an  explanation 


3.  For  the  matrix  A below,  find  a set  S so  that  (S)  = M{A),  where  N(A)  is  the  null  space 
of  A.  (See  Theorem  SSNS.) 


A = 


1 3 

2 1 
1 1 


1 9 
-3  8 
-1  5 


Exercises 

C22 ' For  each  archetype  that  is  a system  of  equations,  consider  the  corresponding  homoge- 
neous system  of  equations.  Write  elements  of  the  solution  set  to  these  homogeneous  systems 
in  vector  form,  as  guaranteed  by  Theorem  VFSLS.  Then  write  the  null  space  of  the  co- 
efficient matrix  of  each  system  as  the  span  of  a set  of  vectors,  as  described  in  Theorem  SSNS. 

Archetype  A,  Archetype  B,  Archetype  C,  Archetype  D/Archetype  E,  Archetype  F,  Archetype 
G/Archetype  H,  Archetype  I,  Archetype  J 

C23^  Archetype  K and  Archetype  L are  defined  as  matrices.  Use  Theorem  SSNS  directly 
to  find  a set  S so  that  ( S } is  the  null  space  of  the  matrix.  Do  not  make  any  reference  to 
the  associated  homogeneous  system  of  equations  in  your  solution. 


r 

' 2 ' 

' 3 ' 

j 

' 5 ' 

C40'  Suppose  that  S = < 

-1 

3 

5 

2 

-2 

> . Let  W = (S)  and  let  x = 

8 

-12 

l 

4 

1 

J 

-5 

If  so,  provide  an  explicit  linear  combination  that  demonstrates  this. 
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C41^  Suppose  that  S = 


' 3 1 1 F' 

_^2  / • Let  W = (S)  and  let  y = ^ . Is  y £ 

1 J 5 


so,  provide  an  explicit  linear  combination  that  demonstrates  this. 


C42^  Suppose  R = 


C43t  Suppose  R = 


-1  -2 


-1  -2 


‘ 1 _ 

-1 

Is  y = -8  in  (R)? 

-4 
-3 


Is  z = 5 in  (R)? 
3 
1 


[ -1  3 1 

044^  Suppose  that  S = < 2 ,1,5 

I 1 2 4 


5 > . Let  W = (S)  and  let  y = 


Is  y £ W?  If  so,  provide  an  explicit  linear  combination  that  demonstrates  this. 

f [-ll  [3]  IYI  [-61  ) 

C45^  Suppose  that  S=<  2 ,1,5,  5 >.  Let  W = { S ) and  let  w 

l L 1 J L2J  L4J  L 1 J J 

Is  w £ IF?  If  so,  provide  an  explicit  linear  combination  that  demonstrates  this. 
C5(F  Let  A be  the  matrix  below. 

1.  Find  a set  S so  that  JV(A)  = { S }. 


2.  If  z = i , then  show  directly  that  z £ JV(A). 


3.  Write  z as  a linear  combination  of  the  vectors  in  S. 


' 2 3 14' 

A=  1213 


C60'  For  the  matrix  A below,  find  a set  of  vectors  S so  that  the  span  of  S equals  the 
null  space  of  A,  ( S } = JV(A). 

' 1 1 6-8' 

A=  1-2  0 1 

-21-67 

M10f  Consider  the  set  of  all  size  2 vectors  in  the  Cartesian  plane  R2. 

1.  Give  a geometric  description  of  the  span  of  a single  vector. 
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2.  How  can  you  tell  if  two  vectors  span  the  entire  plane,  without  doing  any  row  reduction 
or  calculation? 

MlC  Consider  the  set  of  all  size  3 vectors  in  Cartesian  3-space  R3. 

1.  Give  a geometric  description  of  the  span  of  a single  vector. 

2.  Describe  the  possibilities  for  the  span  of  two  vectors. 

3.  Describe  the  possibilities  for  the  span  of  three  vectors. 


" 1 ' 

' 2 ' 

M12f  Let  u = 

3 

—2 

and  v = 

—2 

1 

1.  Find  a vector  wi,  different  from  u and  v,  so  that  {{u,v,wi})  = ({u,  v}}. 

2.  Find  a vector  W2  so  that  ({u,v,W2}}  ^ ({u,v}}. 

M20  In  Example  SCAD  we  began  with  the  four  columns  of  the  coefficient  matrix  of 
Archetype  D,  and  used  these  columns  in  a span  construction.  Then  we  methodically  argued 
that  we  could  remove  the  last  column,  then  the  third  column,  and  create  the  same  set  by 
just  doing  a span  construction  with  the  first  two  columns.  We  claimed  we  could  not  go  any 
further,  and  had  removed  as  many  vectors  as  possible.  Provide  a convincing  argument  for 
why  a third  vector  cannot  be  removed. 

M21'  In  the  spirit  of  Example  SCAD,  begin  with  the  four  columns  of  the  coefficient 
matrix  of  Archetype  C,  and  use  these  columns  in  a span  construction  to  build  the  set  S. 
Argue  that  S can  be  expressed  as  the  span  of  just  three  of  the  columns  of  the  coefficient 
matrix  (saying  exactly  which  three)  and  in  the  spirit  of  Exercise  SS.M20  argue  that  no  one 
of  these  three  vectors  can  be  removed  and  still  have  a span  construction  create  S. 

THF  Suppose  that  vj,  v2  £ Cm.  Prove  that 

({vi,  v2})  = ({vi,  v2,  5vi  + 3v2}) 

T2(F  Suppose  that  5 is  a set  of  vectors  from  Cm.  Prove  that  the  zero  vector,  0,  is  an 
element  of  (S). 

T21  Suppose  that  S is  a set  of  vectors  from  Cm  and  x,  y £ (S).  Prove  that  x + y £ ( S ). 
T22  Suppose  that  S is  a set  of  vectors  from  Cm,  a £ C,  and  x £ (S).  Prove  that  ax  £ (S). 


Section  LI 

Linear  Independence 


“Linear  independence”  is  one  of  the  most  fundamental  conceptual  ideas  in  linear 
algebra,  along  with  the  notion  of  a span.  So  this  section,  and  the  subsequent  Section 
LDS,  will  explore  this  new  idea. 

Subsection  LISV 

Linearly  Independent  Sets  of  Vectors 

Theorem  SLSLC  tells  us  that  a solution  to  a homogeneous  system  of  equations  is 
a linear  combination  of  the  columns  of  the  coefficient  matrix  that  equals  the  zero 
vector.  We  used  just  this  situation  to  our  advantage  (twice!)  in  Example  SCAD 
where  we  reduced  the  set  of  vectors  used  in  a span  construction  from  four  down  to 
two,  by  declaring  certain  vectors  as  surplus.  The  next  two  definitions  will  allow  us 
to  formalize  this  situation. 

Definition  RLDCV  Relation  of  Linear  Dependence  for  Column  Vectors 
Given  a set  of  vectors  S = {ui,  u2,  u3,  . . . , un } , a true  statement  of  the  form 

oqui  + a2u  2 + O3U3  + • • • + anu„  = 0 

is  a relation  of  linear  dependence  on  S.  If  this  statement  is  formed  in  a trivial 
fashion,  i.e.  a*  = 0,  1 < i < n,  then  we  say  it  is  the  trivial  relation  of  linear 
dependence  on  S.  □ 

Definition  LICV  Linear  Independence  of  Column  Vectors 

The  set  of  vectors  S = {ui,  u2,  u3,  . . . , u„}  is  linearly  dependent  if  there  is 
a relation  of  linear  dependence  on  S that  is  not  trivial.  In  the  case  where  the 
only  relation  of  linear  dependence  on  S is  the  trivial  one,  then  S'  is  a linearly 
independent  set  of  vectors.  □ 

Notice  that  a relation  of  linear  dependence  is  an  equation.  Though  most  of  it  is  a 
linear  combination,  it  is  not  a linear  combination  (that  would  be  a vector).  Linear 
independence  is  a property  of  a set  of  vectors.  It  is  easy  to  take  a set  of  vectors, 
and  an  equal  number  of  scalars,  all  zero,  and  form  a linear  combination  that  equals 
the  zero  vector.  When  the  easy  way  is  the  only  way,  then  we  say  the  set  is  linearly 
independent.  Here  are  a couple  of  examples. 

Example  LDS  Linearly  dependent  set  in  C5 


122 
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Consider  the  set  of  n = 4 vectors  from  C5, 


r 2 1 

r 1 1 

r 2 1 

r-6i 

1 

-1 

2 

1 

7 

3 

-1 

-3 

-1 

\ 

1 

5 

6 

0 

2 

2 

1 

1 

> 

To  determine  linear  independence  we  first  form  a relation  of  linear  dependence, 


r 2 1 

r 1 1 

r 2 1 

r-6i 

-1 

2 

1 

7 

3 

+ a2 

-1 

+ «3 

-3 

+ 0.4 

-1 

1 

5 

6 

0 

2 

2 

1 

1 

We  know  that  or  = a2  = £*3  = 0.4  = 0 is  a solution  to  this  equation,  but 
that  is  of  no  interest  whatsoever.  That  is  always  the  case,  no  matter  what  four 
vectors  we  might  have  chosen.  We  are  curious  to  know  if  there  are  other,  nontrivial, 
solutions.  Theorem  SLSLC  tells  us  that  we  can  find  such  solutions  as  solutions  to  the 
homogeneous  system  £<S(A,  0)  where  the  coefficient  matrix  has  these  four  vectors 
as  columns,  which  we  then  row-reduce 


r 2 

1 

2 

-61 

fa 

0 

0 

-2' 

A = 

-1 

3 

2 

-1 

1 

-3 

7 

-1 

RREF 
> 

0 

0 

0 

0 

0 

0 

1 ^ 
CO 

1 

5 

6 

0 

0 

0 

0 

0 

L 2 

2 

1 

1 J 

0 

0 

0 

0 

We  could  solve  this  homogeneous  system  completely,  but  for  this  example  all  we 
need  is  one  nontrivial  solution.  Setting  the  lone  free  variable  to  any  nonzero  value, 
such  as  X4  = 1,  yields  the  nontrivial  solution 

' 2 ' 

-4 
x “ 3 

_ 1 _ 

completing  our  application  of  Theorem  SLSLC,  we  have 


' 2 ' 
-1 

' 1 ' 

2 

' 2 ' 
1 

'-6' 

7 

3 

1 

2 

+ (-4) 

-1 

5 

2 

+ 3 

-3 

6 

1 

+ 1 

-1 

0 

1 

This  is  a relation  of  linear  dependence  on  S that  is  not  trivial,  so  we  conclude 
that  S is  linearly  dependent.  A 

Example  LIS  Linearly  independent  set  in  C5 
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Consider  the  set  of  n = 4 vectors  from  C5, 


r 2 1 

r 1 1 

r 2 1 

r-ei 

1 

-1 

2 

1 

7 

3 

-1 

-3 

-1 

[ 

1 

5 

6 

1 

2 

2 

1 

1 

> 

To  determine  linear  independence  we  first  form  a relation  of  linear  dependence, 


r 2 1 

r 1 1 

r 2 1 

r-6i 

-1 

2 

1 

7 

3 

+ a2 

-1 

+ «3 

-3 

+ 0.4 

-1 

1 

5 

6 

1 

2 

2 

1 

1 

We  know  that  or  = a2  = £*3  = aq  = 0 is  a solution  to  this  equation,  but 
that  is  of  no  interest  whatsoever.  That  is  always  the  case,  no  matter  what  four 
vectors  we  might  have  chosen.  We  are  curious  to  know  if  there  are  other,  nontrivial, 
solutions.  Theorem  SLSLC  tells  us  that  we  can  find  such  solutions  as  solution  to  the 
homogeneous  system  CS(B , 0)  where  the  coefficient  matrix  has  these  four  vectors 
as  columns.  Row-reducing  this  coefficient  matrix  yields, 


[2  1 2-61 

1 

O 

O 

° 1 
R 

B = 

-12  17 

3 -1  -3  -1 

RREF 
> 

O O 
O | 

A]  O 

O O 

15  6 1 

2 2 11 

0 0 0 0 
. 0 0 0 0 . 

From  the  form  of  this  matrix,  we  see  that  there  are  no  free  variables,  so  the 
solution  is  unique,  and  because  the  system  is  homogeneous,  this  unique  solution  is 
the  trivial  solution.  So  we  now  know  that  there  is  but  one  way  to  combine  the  four 
vectors  of  T into  a relation  of  linear  dependence,  and  that  one  way  is  the  easy  and 
obvious  way.  In  this  situation  we  say  that  the  set,  T,  is  linearly  independent.  A 

Example  LDS  and  Example  LIS  relied  on  solving  a homogeneous  system  of 
equations  to  determine  linear  independence.  We  can  codify  this  process  in  a time- 
saving theorem. 

Theorem  LIVHS  Linearly  Independent  Vectors  and  Homogeneous  Systems 
Suppose  that  S = {vi,  V2,  V3,  . . . , v„}  C Cm  is  a set  of  vectors  and  A is  the  m x n 
matrix  whose  columns  are  the  vectors  in  S . Then  S is  a linearly  independent  set  if 
and  only  if  the  homogeneous  system  CS(A , 0)  has  a unique  solution. 

Proof.  (<£=)  Suppose  that  CS(A,  0)  has  a unique  solution.  Since  it  is  a homogeneous 
system,  this  solution  must  be  the  trivial  solution  x = 0.  By  Theorem  SLSLC,  this 
means  that  the  only  relation  of  linear  dependence  on  S is  the  trivial  one.  So  S is 
linearly  independent. 
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(=>)  We  will  prove  the  contrapositive.  Suppose  that  £<S(A,  0)  does  not  have  a 
unique  solution.  Since  it  is  a homogeneous  system,  it  is  consistent  (Theorem  HSC), 
and  so  must  have  infinitely  many  solutions  (Theorem  PSSLS).  One  of  these  infinitely 
many  solutions  must  be  nontrivial  (in  fact,  almost  all  of  them  are),  so  choose  one. 
By  Theorem  SLSLC  this  nontrivial  solution  will  give  a nontrivial  relation  of  linear 
dependence  on  S,  so  we  can  conclude  that  S'  is  a linearly  dependent  set.  ■ 


Since  Theorem  LIVHS  is  an  equivalence,  we  can  use  it  to  determine  the  linear 
independence  or  dependence  of  any  set  of  column  vectors,  just  by  creating  a matrix 
and  analyzing  the  row-reduced  form.  Let  us  illustrate  this  with  two  more  examples. 


Example  LIHS  Linearly  independent,  homogeneous  system 
Is  the  set  of  vectors 


linearly  independent  or  linearly  dependent? 

Theorem  LIVHS  suggests  we  study  the  matrix,  A , whose  columns  are  the  vectors 
in  S.  Specifically,  we  are  interested  in  the  size  of  the  solution  set  for  the  homogeneous 
system  £S(A,  0),  so  we  row-reduce  A. 


A = 


' 2 
-1 

3 

4 
2 


[0 

0 

0 ' 

RREF 

0 

0 

0 

> 

0 

0 

0 

0 

0 

0 

0 

0 

0 _ 

Now,  r = 3,  so  there  are  n — r = 3 — 3 = 0 free  variables  and  we  see  that  £S ( A,  0) 
has  a unique  solution  (Theorem  HSC,  Theorem  FVCS).  By  Theorem  LIVHS,  the 
set  S is  linearly  independent.  A 


Example  LDHS  Linearly  dependent,  homogeneous  system 
Is  the  set  of  vectors 


linearly  independent  or  linearly  dependent? 

Theorem  LIVHS  suggests  we  study  the  matrix,  A,  whose  columns  are  the  vectors 
in  S.  Specifically,  we  are  interested  in  the  size  of  the  solution  set  for  the  homogeneous 
system  £S(A,  0),  so  we  row-reduce  A. 
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r 2 

6 

4 1 

[0 

0 

-1’ 

-1 

2 

3 

RREF 

0 

0 

1 

3 

-1 

—4 



0 

0 

0 

4 

3 

-1 

0 

0 

0 

2 

4 

2 

0 

0 

0 

Now,  r = 2,  so  there  are  n — r = 3 — 2 = 1 free  variables  and  we  see  that  £S (A,  0) 
has  infinitely  many  solutions  (Theorem  HSC,  Theorem  FVCS).  By  Theorem  LIVHS, 
the  set  S is  linearly  dependent.  A 

As  an  equivalence,  Theorem  LIVHS  gives  us  a straightforward  way  to  determine 
if  a set  of  vectors  is  linearly  independent  or  dependent. 

Review  Example  LIHS  and  Example  LDHS.  They  are  very  similar,  differing  only 
in  the  last  two  slots  of  the  third  vector.  This  resulted  in  slightly  different  matrices 
when  row-reduced,  and  slightly  different  values  of  r,  the  number  of  nonzero  rows. 
Notice,  too,  that  we  are  less  interested  in  the  actual  solution  set,  and  more  interested 
in  its  form  or  size.  These  observations  allow  us  to  make  a slight  improvement  in 
Theorem  LIVHS. 

Theorem  LIVRN  Linearly  Independent  Vectors,  r and  n 

Suppose  that  S = {vi,  V2,  V3,  . . . , v„}  C Cm  is  a set  of  vectors  and  A is  the  m x n 
matrix  whose  columns  are  the  vectors  in  S.  Let  B he  a matrix  in  reduced  row-echelon 
form  that  is  row-equivalent  to  A and  let  r denote  the  number  of  pivot  columns  in  B. 
Then  S is  linearly  independent  if  and  only  if  n = r. 

Proof.  Theorem  LIVHS  says  the  linear  independence  of  S is  equivalent  to  the 
homogeneous  linear  system  CS(A,  0)  having  a unique  solution.  Since  CS(A,  0)  is 
consistent  (Theorem  HSC)  we  can  apply  Theorem  CSRN  to  see  that  the  solution  is 
unique  exactly  when  n = r.  ■ 

So  now  here  is  an  example  of  the  most  straightforward  way  to  determine  if  a set 
of  column  vectors  is  linearly  independent  or  linearly  dependent.  While  this  method 
can  be  quick  and  easy,  do  not  forget  the  logical  progression  from  the  definition 
of  linear  independence  through  homogeneous  system  of  equations  which  makes  it 
possible. 

Example  LDRN  Linearly  dependent,  r and  n 
Is  the  set  of  vectors 
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1 

—3 
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-1 

-6 

1 

1 

—2 

3 

-2 

1 

4 

1 

1 
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3 

5 
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2 
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0 
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1 

1 
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S = 
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linearly  independent  or  linearly  dependent? 

Theorem  LIVRN  suggests  we  place  these  vectors  into  a matrix  as  columns  and 
analyze  the  row-reduced  version  of  the  matrix, 


2 

-1 

3 

1 

0 

3 


r0 

0 

0 

0 

-1 

0 

0 

0 

0 

1 

RREF 

y 
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0 

0 
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0 

0 
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0 
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-1 

L 0 

0 

0 

0 

0 J 

Now  we  need  only  compute  that  r = 4<5  = nto  recognize,  via  Theorem  LIVRN 
that  S is  a linearly  dependent  set.  Boom!  A 

Example  LLDS  Large  linearly  dependent  set  in  C4 
Consider  the  set  of  n = 9 vectors  from  C4, 


R = 


r— ii 

3 

1 

2 


7 

1 

-3 

6 


1 

2 

-1 

-2 


'0" 

4 

2 

9 


5 

—2 

4 

3 


2 

1 

-6 

4 


3 

0 

-3 

1 


r-6' 

-i 

i 

i 


To  employ  Theorem  LIVHS,  we  form  a 4 x 9 matrix,  C,  whose  columns  are  the 
vectors  in  R 


C = 


'-I 

3 

1 

2 


7 1 0 5 2 3 1 -6' 

1 2 4-21  01-1 

-3  -1  2 4 -6  -3  5 1 

6 -2  9 3 4 1 3 1 


To  determine  if  the  homogeneous  system  CS(C , 0)  has  a unique  solution  or  not, 
we  would  normally  row-reduce  this  matrix.  But  in  this  particular  example,  we  can 
do  better.  Theorem  HMVEI  tells  us  that  since  the  system  is  homogeneous  with 
n = 9 variables  in  to  = 4 equations,  and  n > to,  there  must  be  infinitely  many 
solutions.  Since  there  is  not  a unique  solution,  Theorem  LIVHS  says  the  set  is  linearly 
dependent.  A 


The  situation  in  Example  LLDS  is  slick  enough  to  warrant  formulating  as  a 
theorem. 


Theorem  MVSLD  More  Vectors  than  Size  implies  Linear  Dependence 
Suppose  that  S = {ui,  Uo,  U3,  . . . , u„}  C Cm  and  n > m.  Then  S is  a linearly 
dependent  set. 


Proof.  Form  the  to  x n matrix  A whose  columns  are  tq,  1 < i < n.  Consider  the 
homogeneous  system  CS(A,  0).  By  Theorem  HMVEI  this  system  has  infinitely  many 
solutions.  Since  the  system  does  not  have  a unique  solution,  Theorem  LIVHS  says 
the  columns  of  A form  a linearly  dependent  set,  as  desired.  ■ 
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Subsection  LINM 

Linear  Independence  and  Nonsingular  Matrices 


We  will  now  specialize  to  sets  of  n vectors  from  Cn.  This  will  put  Theorem  MVSLD 
off-limits,  while  Theorem  LIVHS  will  involve  square  matrices.  Let  us  begin  by 
contrasting  Archetype  A and  Archetype  B. 


Example  LDCAA  Linearly  dependent  columns  in  Archetype  A 
Archetype  A is  a system  of  linear  equations  with  coefficient  matrix, 


A = 


1 

2 

1 


-1 

1 

1 


2' 

1 

0 


Do  the  columns  of  this  matrix  form  a linearly  independent  or  dependent  set?  By 
Example  S we  know  that  A is  singular.  According  to  the  definition  of  nonsingular 
matrices,  Definition  NM,  the  homogeneous  system  £S(A , 0)  has  infinitely  many 
solutions.  So  by  Theorem  LIVHS,  the  columns  of  A form  a linearly  dependent  set.  A 


Example  LICAB  Linearly  independent  columns  in  Archetype  B 
Archetype  B is  a system  of  linear  equations  with  coefficient  matrix, 


B = 


-~7 

5 

1 


-6 

5 

0 


-12" 

7 

4 


Do  the  columns  of  this  matrix  form  a linearly  independent  or  dependent  set? 
By  Example  NM  we  know  that  B is  nonsingular.  According  to  the  definition  of 
nonsingular  matrices,  Definition  NM,  the  homogeneous  system  CS(A,  0)  has  a unique 
solution.  So  by  Theorem  LIVHS,  the  columns  of  B form  a linearly  independent  set. 
A 


That  Archetype  A and  Archetype  B have  opposite  properties  for  the  columns 
of  their  coefficient  matrices  is  no  accident.  Here  is  the  theorem,  and  then  we  will 
update  our  equivalences  for  nonsingular  matrices,  Theorem  NME1. 

Theorem  NMLIC  Nonsingular  Matrices  have  Linearly  Independent  Columns 
Suppose  that  A is  a square  matrix.  Then  A is  nonsingular  if  and  only  if  the  columns 
of  A form  a linearly  independent  set. 


Proof.  This  is  a proof  where  we  can  chain  together  equivalences,  rather  than  proving 
the  two  halves  separately. 


A nonsingular  -<=>■  £S(A , 0)  has  a unique  solution 

•<==>■  columns  of  A are  linearly  independent 


Definition  NM 
Theorem  LIVHS 
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Here  is  the  update  to  Theorem  NME1. 

Theorem  NME2  Nonsingular  Matrix  Equivalences,  Round  2 
Suppose  that  A is  a square  matrix.  The  following  are  equivalent. 

1.  A is  nonsingular. 

2.  A row-reduces  to  the  identity  matrix. 

3.  The  null  space  of  A contains  only  the  zero  vector,  M(A)  = {0}. 

The  linear  system  CS(A,  b)  has  a unique  solution  for  every  possible  choice  of 

b. 

5.  The  columns  of  A form  a linearly  independent  set. 

Proof.  Theorem  NMLIC  is  yet  another  equivalence  for  a nonsingular  matrix,  so  we 
can  add  it  to  the  list  in  Theorem  NME1.  ■ 


Subsection  NSSLI 

Null  Spaces,  Spans,  Linear  Independence 


In  Subsection  SS.SSNS  we  proved  Theorem  SSNS  which  provided  n — r vectors 
that  could  be  used  with  the  span  construction  to  build  the  entire  null  space  of  a 
matrix.  As  we  have  hinted  in  Example  SCAD,  and  as  we  will  see  again  going  forward, 
linearly  dependent  sets  carry  redundant  vectors  with  them  when  used  in  building 
a set  as  a span.  Our  aim  now  is  to  show  that  the  vectors  provided  by  Theorem 
SSNS  form  a linearly  independent  set,  so  in  one  sense  they  are  as  efficient  as  possible 
a way  to  describe  the  null  space.  Notice  that  the  vectors  zj,  1 < j < n — r first 
appear  in  the  vector  form  of  solutions  to  arbitrary  linear  systems  (Theorem  VFSLS). 
The  exact  same  vectors  appear  again  in  the  span  construction  in  the  conclusion  of 
Theorem  SSNS.  Since  this  second  theorem  specializes  to  homogeneous  systems  the 
only  real  difference  is  that  the  vector  c in  Theorem  VFSLS  is  the  zero  vector  for  a 
homogeneous  system.  Finally,  Theorem  BNS  will  now  show  that  these  same  vectors 
are  a linearly  independent  set.  We  will  set  the  stage  for  the  proof  of  this  theorem 
with  a moderately  large  example.  Study  the  example  carefully,  as  it  will  make  it 
easier  to  understand  the  proof. 


Example  LINSB  Linear  independence  of  null  space  basis 

Suppose  that  we  are  interested  in  the  null  space  of  a 3x  7 matrix,  A , which  row-reduces 
to 


B = 


0 0 —2  4 0 3 

0 0 5 6 0 7 

. 0 0 0 0 0 8 


9 ' 
1 

-5. 
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The  set  F = {3,  4,  6,  7}  is  the  set  of  indices  for  our  four  free  variables  that  would 
be  used  in  a description  of  the  solution  set  for  the  homogeneous  system  £<S(A,  0). 
Applying  Theorem  SSNS  we  can  begin  to  construct  a set  of  four  vectors  whose  span 
is  the  null  space  of  A,  a set  of  vectors  we  will  reference  as  T. 


J\f{A)  = ( T ) = ({zi,  z2,  z3,  z4})  = ( < 


So  far,  we  have  constructed  as  much  of  these  individual  vectors  as  we  can,  based 
just  on  the  knowledge  of  the  contents  of  the  set  F.  This  has  allowed  us  to  determine 
the  entries  in  slots  3,  4,  6 and  7,  while  we  have  left  slots  1,  2 and  5 blank.  Without 
doing  any  more,  let  us  ask  if  T is  linearly  independent?  Begin  with  a relation  of 
linear  dependence  on  T,  and  see  what  we  can  learn  about  the  scalars, 


1 

0 

0 

0 

0 

5 

1 

1 

0 

5 

0 

0 

0 

1 

0 

0 

0 

0 

1 

0 — ci4z4  T CI2Z2  ~\~  CI3Z3  -f-  CI4Z4 


-°1 

0 

0 

1 

0 

0 

0 

0 

= Oil 

0 

+ a2 

1 

+ a3 

0 

+ CX  4 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

1 

Oil 

0 


+ 


0 

Oil 


oi\ 

Oil 


«3 

0 


0 

Qq 


£*3 

aq 


Applying  Definition  CVE  to  the  two  ends  of  this  chain  of  equalities,  we  see  that 
011  = 011  = 03  = 014  = 0.  So  the  only  relation  of  linear  dependence  on  the  set  T is  a 
trivial  one.  By  Definition  LICV  the  set  T is  linearly  independent.  The  important 
feature  of  this  example  is  how  the  “pattern  of  zeros  and  ones”  in  the  four  vectors 
led  to  the  conclusion  of  linear  independence.  A 


The  proof  of  Theorem  BNS  is  really  quite  straightforward,  and  relies  on  the 
“pattern  of  zeros  and  ones”  that  arise  in  the  vectors  Zj , 1 < i < n — r in  the  entries 
that  arise  with  the  locations  of  the  non-pivot  columns.  Play  along  with  Example 
LINSB  as  you  study  the  proof.  Also,  take  a look  at  Example  VFSAD,  Example 
VFSAI  and  Example  VFSAL,  especially  at  the  conclusion  of  Step  2 (temporarily 
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ignore  the  construction  of  the  constant  vector,  c).  This  proof  is  also  a good  first 
example  of  how  to  prove  a conclusion  that  states  a set  is  linearly  independent. 

Theorem  BNS  Basis  for  Null  Spaces 

Suppose  that  A is  an  m x n matrix , and  B is  a row- equivalent  matrix  in  reduced 
row-echelon  form  with  r pivot  columns.  Let  D = {d±,  c?2,  d 3,  dr}  and  F = 
{/1,  f-2 , /3,  fn-r}  be  the  sets  of  column  indices  where  B does  and  does  not 
(respectively)  have  pivot  columns.  Construct  the  n — r vectors  z j,  1 < j < n ~ r of 
size  n as 


2.  S is  a linearly  independent  set. 

Proof.  Notice  first  that  the  vectors  zy , 1 < j < n — r are  exactly  the  same  as  the 
n — r vectors  defined  in  Theorem  SSNS.  Also,  the  hypotheses  of  Theorem  SSNS  are 
the  same  as  the  hypotheses  of  the  theorem  we  are  currently  proving.  So  it  is  then 
simply  the  conclusion  of  Theorem  SSNS  that  tells  us  that  A/”(A)  = ( S ).  That  was 
the  easy  half,  but  the  second  part  is  not  much  harder.  What  is  new  here  is  the  claim 
that  S'  is  a linearly  independent  set. 

To  prove  the  linear  independence  of  a set,  we  need  to  start  with  a relation  of 
linear  dependence  and  somehow  conclude  that  the  scalars  involved  must  all  be  zero, 
i.e.  that  the  relation  of  linear  dependence  only  happens  in  the  trivial  fashion.  So  to 
establish  the  linear  independence  of  S,  we  start  with 

aizi  + ci2z2  + 3Z3  + • ■ ■ + otn-rzn-r  = 0. 

For  each  j,  1 < j < n — r,  consider  the  equality  of  the  individual  entries  of  the 
vectors  on  both  sides  of  this  equality  in  position  fj, 


1 if  i e F,  i = fj 


Mi  = S 0 ifi&F,  fj 

if  i € D,  i = dk 


Define  the  set  S = {zi,  Z2,  Z3,  . . . , z,n_r} .Then 


1.  A f(A)  = (S). 


Definition  CVE 


Definition  CVA 


“1  M /,-  + «2  M f.  + a3  [z3] /.+•••  + 


aj- 1 [zi-i] + aj  M /j  + ai+ 1 lzi+ 1] /A 


Definition  CVSM 
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aj_i(0)  + ctj(  1)  + aJ+i(0)  + • • • + ara_r(0)  Definition  of  Zj 

= aj 

So  for  all  j,  1 < j < n — r,  we  have  aj  =0,  which  is  the  conclusion  that  tells 
us  that  the  only  relation  of  linear  dependence  on  5=  {z±,  Z2,  Z3,  . . . , zra_r}  is  the 
trivial  one.  Hence,  by  Definition  LICV  the  set  is  linearly  independent,  as  desired. ■ 


Example  NSLIL  Null  space  spanned  by  linearly  independent  set,  Archetype  L 
In  Example  VFSAL  we  previewed  Theorem  SSNS  by  finding  a set  of  two  vectors 
such  that  their  span  was  the  null  space  for  the  matrix  in  Archetype  L.  Writing  the 
matrix  as  L , we  have 


Solving  the  homogeneous  system  JZS(L,  0)  resulted  in  recognizing  £4  and  £5  as  the 
free  variables.  So  look  in  entries  4 and  5 of  the  two  vectors  above  and  notice  the 
pattern  of  zeros  and  ones  that  provides  the  linear  independence  of  the  set.  A 


Reading  Questions 


1.  Let  S be  the  set  of  three  vectors  below. 


Is  S linearly  independent  or  linearly  dependent?  Explain  why. 

2.  Let  S be  the  set  of  three  vectors  below. 


f 

' 1 ' 

'3' 

' 4 ' 

I 

s=\ 

-1 

2 

5 

3 

1 

\ 

0 

2 

-4 

1 

Is  S linearly  independent  or  linearly  dependent?  Explain  why. 

3.  Is  the  matrix  below  singular  or  nonsingular?  Explain  your  answer  using  only  the  final 
conclusion  you  reached  in  the  previous  question,  along  with  one  new  theorem. 

'l  3 4 ' 

-12  3 

0 2-4 


Exercises 

Determine  if  the  sets  of  vectors  in  Exercises  C20-C25  are  linearly  independent  or  linearly 
dependent.  When  the  set  is  linearly  dependent,  exhibit  a nontrivial  relation  of  linear 
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C30'  For  the  matrix  B below,  find  a set  S that  is  linearly  independent  and  spans  the 
null  space  of  B,  that  is,  M{B)  = (S). 

-3  1-2  7 ' 

-12  1 4 

112-1 


C3C  For  the  matrix  A below,  find  a linearly  independent  set  S so  that  the  null  space  of 
A is  spanned  by  S,  that  is,  Af(A)  = { S ). 

‘-I  -2  2 15" 

1 2 115 

3 6127 

_ 2 4 0 1 2. 

C32'  Find  a set  of  column  vectors,  T,  such  that  (1)  the  span  of  T is  the  null  space  of  B, 
(T)  = Af(B)  and  (2)  T is  a linearly  independent  set. 

2 1 1 1 ' 

-4  -3  1 -7 

11-13 
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C33^  Find  a set  S so  that  S is  linearly  independent  and  JV(A)  = { S },  where  JV(A)  is  the 
null  space  of  the  matrix  A below. 


A = 


2 3 
1 1 

3 2 


3 

-1 

-8 


1 4 

-1  -3 

-1  1 


C50  Consider  each  archetype  that  is  a system  of  equations  and  consider  the  solutions 
listed  for  the  homogeneous  version  of  the  archetype.  (If  only  the  trivial  solution  is  listed, 
then  assume  this  is  the  only  solution  to  the  system.)  From  the  solution  set,  determine  if 
the  columns  of  the  coefficient  matrix  form  a linearly  independent  or  linearly  dependent 
set.  In  the  case  of  a linearly  dependent  set,  use  one  of  the  sample  solutions  to  provide 
a nontrivial  relation  of  linear  dependence  on  the  set  of  columns  of  the  coefficient  matrix 
(Definition  RLD).  Indicate  when  Theorem  MVSLD  applies  and  connect  this  with  the 
number  of  variables  and  equations  in  the  system  of  equations. 


Archetype  A,  Archetype  B,  Archetype  C,  Archetype  D/Archetype  E,  Archetype  F,  Archetype 
G/Archetype  H,  Archetype  I,  Archetype  J 

C51  For  each  archetype  that  is  a system  of  equations  consider  the  homogeneous  version. 
Write  elements  of  the  solution  set  in  vector  form  (Theorem  VFSLS)  and  from  this  extract 
the  vectors  z j described  in  Theorem  BNS.  These  vectors  are  used  in  a span  construction 
to  describe  the  null  space  of  the  coefficient  matrix  for  each  archetype.  What  does  it  mean 
when  we  write  a null  space  as  ({  })? 


Archetype  A,  Archetype  B,  Archetype  C,  Archetype  D/Archetype  E,  Archetype  F,  Archetype 
G/Archetype  H,  Archetype  I,  Archetype  J 

C52  For  each  archetype  that  is  a system  of  equations  consider  the  homogeneous  version. 
Sample  solutions  are  given  and  a linearly  independent  spanning  set  is  given  for  the  null 
space  of  the  coefficient  matrix.  Write  each  of  the  sample  solutions  individually  as  a lin- 
ear combination  of  the  vectors  in  the  spanning  set  for  the  null  space  of  the  coefficient  matrix. 


Archetype  A,  Archetype  B,  Archetype  C,  Archetype  D/Archetype  E,  Archetype  F,  Archetype 
G/Archetype  H,  Archetype  I,  Archetype  J 

C60'  For  the  matrix  A below,  find  a set  of  vectors  S so  that  (1)  S is  linearly  independent, 
and  (2)  the  span  of  S equals  the  null  space  of  A,  ( S ) = Af(A).  (See  Exercise  SS.C60.) 


A = 


1 

1 

-2 


1 

-2 

1 


6 

0 

-6 


-8 

1 

7 


M20'  Suppose  that  S = {vi,  vi,  V3}  is  a set  of  three  vectors  from  C873.  Prove  that  the 
set 

T = (2vi  + 3v2  + v3,  vi  — v2  — 2v3,  2 Vi  + v2  — v3} 
is  linearly  dependent. 
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M21'  Suppose  that  S = {vi,  v3,  V3}  is  a linearly  independent  set  of  three  vectors  from 
C873.  Prove  that  the  set 


T = {2vi  + 3v2  + v3,  vi  — v2  + 2v3,  2vi  + v2  — v3} 


is  linearly  independent. 

M50f  Consider  the  set  of  vectors  from  C3,  W,  given  below.  Find  a set  T that  contains 
three  vectors  from  W and  such  that  W = (T). 


W = ({vi,  v2,  v3,  v4,  v5}} 


M51'  Consider  the  subspace  W = ({vi,  V2,  v3,  v4}).  Find  a set  S so  that  (1)  S'  is  a 
subset  of  W,  (2)  S is  linearly  independent,  and  (3)  W = ( S ).  Write  each  vector  not  included 
in  S as  a linear  combination  of  the  vectors  that  are  in  S. 


Vi  = 

1 

-1 

v2  = 

^ 1 
1 

v3  = 

M w 

v4  = 

2 

1 

2 

8 

—7 

7 

T10  Prove  that  if  a set  of  vectors  contains  the  zero  vector,  then  the  set  is  linearly 
dependent.  (Ed.  “The  zero  vector  is  death  to  linearly  independent  sets.”) 

T12  Suppose  that  S is  a linearly  independent  set  of  vectors,  and  T is  a subset  of  S,T  C.  S 
(Definition  SSET).  Prove  that  T is  linearly  independent. 

T13  Suppose  that  T is  a linearly  dependent  set  of  vectors,  and  T is  a subset  of  S,  T C S 
(Definition  SSET).  Prove  that  S is  linearly  dependent. 

T15^  Suppose  that  {vi,  V2,  v3,  . . . , vn}  is  a set  of  vectors.  Prove  that 
{vi  - v2,  v2  - v3,  v3  — v4,  . . . , v„  - Vi} 
is  a linearly  dependent  set. 

T2(F  Suppose  that  {vi,  V2,  v3,  v4}  is  a linearly  independent  set  in  C35.  Prove  that 
{vi,  vi  + v2,  Vi  + v2  + v3,  vi  + v2  + v3  + v4} 
is  a linearly  independent  set. 

T5(F  Suppose  that  A is  an  m x n matrix  with  linearly  independent  columns  and  the 
linear  system  £S(A,  b)  is  consistent.  Show  that  this  system  has  a unique  solution.  (Notice 
that  we  are  not  requiring  A to  be  square.) 


Section  LDS 

Linear  Dependence  and  Spans 


In  any  linearly  dependent  set  there  is  always  one  vector  that  can  be  written  as  a 
linear  combination  of  the  others.  This  is  the  substance  of  the  upcoming  Theorem 
DLDS.  Perhaps  this  will  explain  the  use  of  the  word  “dependent.”  In  a linearly 
dependent  set,  at  least  one  vector  “depends”  on  the  others  (via  a linear  combination). 

Indeed,  because  Theorem  DLDS  is  an  equivalence  (Proof  Technique  E)  some 
authors  use  this  condition  as  a definition  (Proof  Technique  D)  of  linear  dependence. 
Then  linear  independence  is  defined  as  the  logical  opposite  of  linear  dependence.  Of 
course,  we  have  chosen  to  take  Definition  LICV  as  our  definition,  and  then  follow 
with  Theorem  DLDS  as  a theorem. 


If  we  use  a linearly  dependent  set  to  construct  a span,  then  we  can  always  create 
the  same  infinite  set  with  a starting  set  that  is  one  vector  smaller  in  size.  We  will 
illustrate  this  behavior  in  Example  RSC5.  However,  this  will  not  be  possible  if  we 
build  a span  from  a linearly  independent  set.  So  in  a certain  sense,  using  a linearly 
independent  set  to  formulate  a span  is  the  best  possible  way  — there  are  not  any 
extra  vectors  being  used  to  build  up  all  the  necessary  linear  combinations.  OK,  here 
is  the  theorem,  and  then  the  example. 

Theorem  DLDS  Dependency  in  Linearly  Dependent  Sets 

Suppose  that  S = {iq,  u2,  u3,  . . . , u„}  is  a set  of  vectors.  Then  S is  a linearly 
dependent  set  if  and  only  if  there  is  an  index  t,  1 < t < n such  that  ut  is  a linear 
combination  of  the  vectors  ui,  u2,  u3,  . . . , ut_i,  ut+i,  . . . , ura. 

Proof.  (=>)  Suppose  that  S is  linearly  dependent,  so  there  exists  a nontrivial  relation 
of  linear  dependence  by  Definition  LICV.  That  is,  there  are  scalars,  a*,  1 < i < n, 
which  are  not  all  zero,  such  that 

aqui  + a2u2  + a3u3  + • • • + unun  = 0. 

Since  the  on  cannot  all  be  zero,  choose  one,  say  at,  that  is  nonzero.  Then, 


Subsection  LDSS 

Linearly  Dependent  Sets  and  Spans 


-1 


(— atilt) 


Property  MICN 


ut  = 


Ut 

-1 


(aqui  H 1-  Qt-iUt-i  + at+iW+i  H b a„u„)  Theorem  VSPCV 


ut 

—u\ 


ui  H b 


— ( 


u„  Theorem  VSPCV 
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Since  the  values  of  are  again  scalars,  we  have  expressed  ut  as  a linear  combi- 
nation of  the  other  elements  of  S. 

(<=)  Assume  that  the  vector  ut  is  a linear  combination  of  the  other  vectors  in 
S.  Write  this  linear  combination,  denoting  the  relevant  scalars  as  fa , fa,  , fa- 

fa+ lj  • * • ; fin,  Q-S 


Ut  — /3iUi  + /?2U2  + ■ ■ ■ + /3t_iut_i  + /3t+iuf+i  + ■ • • + /3nu„ 

Then  we  have 


Aw  + • • • + + (— l)ut  + /3t+1ut+1  + • • • + finun 

= ut  + (-l)ut 
= (1  + (-!))ut 
= 0ut 

= 0 


Theorem  VSPCV 
Property  DSAC 
Property  AICN 
Definition  CVSM 


So  the  scalars  fix,  fi2,  fa,  ■ ■ ■ , fit-i,  fa  = — 1,  fa+i,  ■ ■ ■ , fa  provide  a nontrivial 
linear  combination  of  the  vectors  in  S,  thus  establishing  that  S’  is  a linearly  dependent 
set  (Definition  LICV).  ■ 


This  theorem  can  be  used,  sometimes  repeatedly,  to  whittle  down  the  size  of  a 
set  of  vectors  used  in  a span  construction.  We  have  seen  some  of  this  already  in 
Example  SCAD,  but  in  the  next  example  we  will  detail  some  of  the  subtleties. 


Example  RSC5  Reducing  a span  in  C5 
Consider  the  set  of  n = 4 vectors  from  C5, 


R = {vi,  v2 , v3 , v4}  = 


r 

r i i 

[-21 

r o i 

1-41 

2 

1 

-7 

1 

-1 

3 

6 

2 

\ 

1 

3 

1 

-11 

1 

i 

2 

2 

-2 

6 

> 

and  define  V = ( R }. 

To  employ  Theorem  LIVHS,  we  form  a 5 x 4 matrix,  D,  and  row-reduce  to 
understand  solutions  to  the  homogeneous  system  CS{D , 0), 


r i 

2 

0 

41 

rm 

0 

0 

4 

D = 

2 

-1 

1 

3 

-7 

6 

1 

2 

RREF 
>■ 

0 

0 

0 

0 

0 

0 

0 

1 

3 

1 

-11 

1 

0 

0 

0 

0 

L 2 

2 

-2 

6J 

0 

0 

0 

0 

We  can  find  infinitely  many  solutions  to  this  system,  most  of  them  nontrivial, 
and  we  choose  any  one  we  like  to  build  a relation  of  linear  dependence  on  R.  Let  us 
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begin  with  x4  = 1,  to  find  the  solution 

’-4' 

0 

-1 

_ 1 _ 

So  we  can  write  the  relation  of  linear  dependence, 

(— 4)vi  + 0v2  + ( l)v3  + lv4  = 0 

Theorem  DLDS  guarantees  that  we  can  solve  this  relation  of  linear  dependence 
for  some  vector  in  R , but  the  choice  of  which  one  is  up  to  us.  Notice  however  that 
v2  has  a zero  coefficient.  In  this  case,  we  cannot  choose  to  solve  for  v2.  Maybe  some 
other  relation  of  linear  dependence  would  produce  a nonzero  coefficient  for  v2  if  we 
just  had  to  solve  for  this  vector.  Unfortunately,  this  example  has  been  engineered  to 
always  produce  a zero  coefficient  here,  as  you  can  see  from  solving  the  homogeneous 
system.  Every  solution  has  22  = 0! 

OK,  if  we  are  convinced  that  we  cannot  solve  for  v2,  let  us  instead  solve  for  V3, 
v3  = ( 4)  v4  + 0v2  + lv4  = (— 4)v4  + lv4 

We  now  claim  that  this  particular  equation  will  allow  us  to  write 

V = (R)  = ({vi,  v2,  v3,  v4})  = ({vi,  v2,  v4}) 

in  essence  declaring  v3  as  surplus  for  the  task  of  building  V as  a span.  This  claim 
is  an  equality  of  two  sets,  so  we  will  use  Definition  SE  to  establish  it  carefully.  Let 
R!  = {vi,  v2,  v4}  and  V'  = (R1)-  We  want  to  show  that  V = V' . 

First  show  that  V'  C V . Since  every  vector  of  R'  is  in  R , any  vector  we  can 
construct  in  V'  as  a linear  combination  of  vectors  from  R'  can  also  be  constructed 
as  a vector  in  V by  the  same  linear  combination  of  the  same  vectors  in  R.  That  was 
easy,  now  turn  it  around. 

Next  show  that  V C V' . Choose  any  v from  V.  So  there  are  scalars  op,  a2,  a3,  a4 
such  that 


v = aqvi  + a2v2  + a3v3  + a4v4 

= aqvi  + a2v2  + a3  ((— 4)v4  + lv4)  + a4v4 
= aiVi  + a2v2  + ((— 4a3)v4  + a3v4)  + a4v4 
= (a4  - 4a3)  vi  + a2v2  + (a3  + a4)  v4. 

This  equation  says  that  v can  then  be  written  as  a linear  combination  of  the 
vectors  in  R'  and  hence  qualifies  for  membership  in  V' . So  V C V'  and  we  have 
established  that  V = V' . 

If  R'  was  also  linearly  dependent  (it  is  not),  we  could  reduce  the  set  even  further. 
Notice  that  we  could  have  chosen  to  eliminate  any  one  of  v4,  v3  or  v4,  but  somehow  v2 
is  essential  to  the  creation  of  V since  it  cannot  be  replaced  by  any  linear  combination 
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of  Vi,  V3  or  V4. 


A 


Subsection  COV 
Casting  Out  Vectors 


In  Example  RSC5  we  used  four  vectors  to  create  a span.  With  a relation  of  linear 
dependence  in  hand,  we  were  able  to  “toss  out”  one  of  these  four  vectors  and  create 
the  same  span  from  a subset  of  just  three  vectors  from  the  original  set  of  four.  We  did 
have  to  take  some  care  as  to  just  which  vector  we  tossed  out.  In  the  next  example, 
we  will  be  more  methodical  about  just  how  we  choose  to  eliminate  vectors  from  a 
linearly  dependent  set  while  preserving  a span. 


Example  COV  Casting  out  vectors 

We  begin  with  a set  S containing  seven  vectors  from  C4, 


and  define  W = (S). 

The  set  S is  obviously  linearly  dependent  by  Theorem  MVSLD,  since  we  have 
n = 7 vectors  from  C4.  So  we  can  slim  down  S some,  and  still  create  W as  the  span 
of  a smaller  set  of  vectors. 

As  a device  for  identifying  relations  of  linear  dependence  among  the  vectors  of  S, 
we  place  the  seven  column  vectors  of  S into  a matrix  as  columns, 


A = [Ai|A2|A3|  . . . |A7]  = 


■ 1 

4 

0 

-1 

0 

7 

— 9' 

|a7]  = 

2 

8 

-1 

3 

9 

-13 

7 

0 

0 

2 

-3 

-4 

12 

-8 

-1 

-4 

2 

4 

8 

-31 

37 

By  Theorem  SLSLC  a nontrivial  solution  to  CS{A,  0)  will  give  us  a nontrivial 
relation  of  linear  dependence  (Definition  RLDCV)  on  the  columns  of  A (which  are 
the  elements  of  the  set  S).  The  row-reduced  form  for  A is  the  matrix 


'0  4 0 0 2 1 —3 

0 0 0 0 1-35 

0 0 0 0 2-66 

0 0 0 0 0 0 0 


so  we  can  easily  create  solutions  to  the  homogeneous  system  CS(A,  0)  using  the 
free  variables  x2,  £5,  x$,  cc7.  Any  such  solution  will  provide  a relation  of  linear 
dependence  on  the  columns  of  B.  These  solutions  will  allow  us  to  solve  for  one 
column  vector  as  a linear  combination  of  some  others,  in  the  spirit  of  Theorem 
DLDS,  and  remove  that  vector  from  the  set.  We  will  set  about  forming  these  linear 
combinations  methodically. 
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Set  the  free  variable  ir2  = 1,  and  set  the  other  free  variables  to  zero.  Then  a 
solution  to  CS(A,  0)  is 


-4 

1 

0 

0 

0 

0 

0 


which  can  be  used  to  create  the  linear  combination 


(— 4)Ai  + IA2  + OA3  + OA4  + OA5  + 0A6  + OA7  = 0 

This  can  then  be  arranged  and  solved  for  A2,  resulting  in  A2  expressed  as  a 
linear  combination  of  {Ai,  A3,  A4}, 

A.  2 = 4Ai  + OA3  + OA4 


This  means  that  A2  is  surplus,  and  we  can  create  W just  as  well  with  a smaller 
set  with  this  vector  removed, 

W = ({Ai,  A3,  A4,  A5,  A6,  A7}) 


Technically,  this  set  equality  for  W requires  a proof,  in  the  spirit  of  Example 
RSC5,  but  we  will  bypass  this  requirement  here,  and  in  the  next  few  paragraphs. 

Now,  set  the  free  variable  x§  = 1,  and  set  the  other  free  variables  to  zero.  Then 
a solution  to  CS(B , 0)  is 


-2' 

0 

-1 

-2 

1 

0 

0 


which  can  be  used  to  create  the  linear  combination 


( — 2)Ai  + 0A2  + ( — 1)A3  T ( — 2)A4  + IA5  + OAg  + OA7  — 0 

This  can  then  be  arranged  and  solved  for  A5,  resulting  in  A5  expressed  as  a 
linear  combination  of  {Ai,  A3,  A4}, 

A5  = 2Ai  + IA3  + 2A4 

This  means  that  A5  is  surplus,  and  we  can  create  W just  as  well  with  a smaller 
set  with  this  vector  removed, 

W = ({Ai,  A3 , A4 , A6,  A7 } ) 

Do  it  again,  set  the  free  variable  xq  = 1,  and  set  the  other  free  variables  to  zero. 
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Then  a solution  to  CS{B , 0)  is 

'-1' 

0 

3 

x = 6 

0 
1 
0 

which  can  be  used  to  create  the  linear  combination 

(— l)Ai  + OA2  + 3A3  + 6A4  + OA5  + lAg  + OA7  = 0 

This  can  then  be  arranged  and  solved  for  Ag,  resulting  in  Ag  expressed  as  a 
linear  combination  of  {A3,  A3,  A4}, 

Ag  = lAi  + (— 3)A3  + (— 6)A4 

This  means  that  Ag  is  surplus,  and  we  can  create  W just  as  well  with  a smaller  set 
with  this  vector  removed, 

W = ({Ai,  A3,  A4,  At}) 

Set  the  free  variable  £7  = 1,  and  set  the  other  free  variables  to  zero.  Then  a 
solution  to  CS(B,  0)  is 

' 3 ' 

0 

-5 

x=  -6 
0 
0 
1 

which  can  be  used  to  create  the  linear  combination 

3Ai  + OA2  + ( — 5)A3  + ( — 6)A4  + OA5  + OAg  + IA7  = 0 

This  can  then  be  arranged  and  solved  for  A7,  resulting  in  A7  expressed  as  a 
linear  combination  of  {A1;  A3,  A4}, 

A7  = (-3)A!  + 5A3  + 6A4 

This  means  that  A7  is  surplus,  and  we  can  create  W just  as  well  with  a smaller 
set  with  this  vector  removed, 

W = ({Ai,  A3,  A4}) 

You  might  think  we  could  keep  this  up,  but  we  have  run  out  of  free  variables. 
And  not  coincidentally,  the  set  {Ai,  A3,  A4}  is  linearly  independent  (check  this!). 
It  should  be  clear  how  each  free  variable  was  used  to  eliminate  the  a column  from 
the  set  used  to  span  the  column  space,  as  this  will  be  the  essence  of  the  proof  of  the 


§LDS 


Beezer:  A First  Course  in  Linear  Algebra 


142 


next  theorem.  The  column  vectors  in  S were  not  chosen  entirely  at  random,  they  are 
the  columns  of  Archetype  I.  See  if  you  can  mimic  this  example  using  the  columns  of 
Archetype  J.  Go  ahead,  we’ll  go  grab  a cup  of  coffee  and  be  back  before  you  finish 
up. 

For  extra  credit,  notice  that  the  vector 

'3" 

i 9 

b=  1 
_4_ 

is  the  vector  of  constants  in  the  definition  of  Archetype  I.  Since  the  system  CS(A , b) 
is  consistent,  we  know  by  Theorem  SLSLC  that  b is  a linear  combination  of  the 
columns  of  A,  or  stated  equivalently,  b £ W.  This  means  that  b must  also  be  a 
linear  combination  of  just  the  three  columns  A1;  A3,  A4.  Can  you  find  such  a linear 
combination?  Did  you  notice  that  there  is  just  a single  (unique)  answer?  Hmmmm. 
A 

Example  COV  deserves  your  careful  attention,  since  this  important  example 
motivates  the  following  very  fundamental  theorem. 

Theorem  BS  Basis  of  a Span 

Suppose  that  S = {vi,  V2,  v3,  . . . , v„}  is  a set  of  column  vectors.  Define  W = ( S ) 
and  let  A be  the  matrix  whose  columns  are  the  vectors  from  S . Let  B be  the  reduced 
row-echelon  form  of  A,  with  D = {c?!,  d3,  d3,  . . . , dr}  the  set  of  indices  for  the  pivot 
columns  of  B.  Then 

1.  T = {vdj,  Vd2,  Vd3,  . . • v^}  is  a linearly  independent  set. 

2.  W = ( T ). 

Proof.  To  prove  that  T is  linearly  independent,  begin  with  a relation  of  linear 
dependence  on  T, 

0 = aqVdj  + Qf2vd2  + Q;3vd3  + • • • + Otr\  dT 

and  we  will  try  to  conclude  that  the  only  possibility  for  the  scalars  at  is  that  they  are 
all  zero.  Denote  the  non-pivot  columns  of  B by  F = {/1,  /2,  /3,  . . . , fn-r}-  Then 
we  can  preserve  the  equality  by  adding  a big  fat  zero  to  the  linear  combination, 

0 = aivdl  + a2vd2  + a3vd3  + ■ ■ ■ + arvdr  + 0vfl  + 0v/2  + 0v/3  + . . . + 0v/„_r 

By  Theorem  SLSLC,  the  scalars  in  this  linear  combination  (suitably  reordered) 
are  a solution  to  the  homogeneous  system  CS(A,  0).  But  notice  that  this  is  the 
solution  obtained  by  setting  each  free  variable  to  zero.  If  we  consider  the  description  of 
a solution  vector  in  the  conclusion  of  Theorem  VFSLS,  in  the  case  of  a homogeneous 
system,  then  we  see  that  if  all  the  free  variables  are  set  to  zero  the  resulting  solution 
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vector  is  trivial  (all  zeros).  So  it  must  be  that  = 0,  1 < i < r.  This  implies  by 
Definition  LICV  that  T is  a linearly  independent  set. 

The  second  conclusion  of  this  theorem  is  an  equality  of  sets  (Definition  SE). 
Since  T is  a subset  of  S , any  linear  combination  of  elements  of  the  set  T can  also 
be  viewed  as  a linear  combination  of  elements  of  the  set  S.  So  (T)  C (S)  = W.  It 
remains  to  prove  that  W = ( S ) C (T). 

For  each  k,  1 < k < n — r,  form  a solution  x to  £S(A,  0)  by  setting  the  free 
variables  as  follows: 

xh  = 0 xh  = 0 xh  = 0 ■ • • xfk  = 1 • ■ • xU-r  = 0 

By  Theorem  VFSLS,  the  remainder  of  this  solution  vector  is  given  by, 

Xdi  = ~ xd2  = ~ Jk  xd3  = ~ [B]3jk  ■ ■ ■ Xdr  = — [B]rjk 

From  this  solution,  we  obtain  a relation  of  linear  dependence  on  the  columns  of 

A, 

~ [B]i Jk  vdl  - [B\ 2Jk  vd2  - [B]3Jk  vd3  - . . . - [B\r  fk  vdr  + lv/h  = 0 

which  can  be  arranged  as  the  equality 

VA  = \-B\i jk  v^i  + \B]2jk  vd2  + [B}3Jk  vd3+...  + [B]r  fk  vdr 

Now,  suppose  we  take  an  arbitrary  element,  w,  of  IT  = (S)  and  write  it  as  a 
linear  combination  of  the  elements  of  S , but  with  the  terms  organized  according  to 
the  indices  in  D and  F, 

w = «ivdl  + a2vda  + . . . + arXdT  + ftv/,  + /?2v/2  + . . . + /3n-rvfn_r 
From  the  above,  we  can  replace  each  Vfj  by  a linear  combination  of  the  vd;, 
w = aivdl  + a2Vd2  + ■ ■ • + arvdr.+ 

A ([-BJi,/!  vdi  + [-b]2,/1  vd2  + [B] 3ji  vd3  + . . . + [B\r  fi  vdr)  + 
fa  ([^]i,/2  vdi  + [B\2j2  vd2  + [ B]3j2  vd3  + . . . + [B\r  f2  vdrJ  + 

Pn-r  ([B]1Jn_r  Vdl  + [B]2  fn_r  Vd2  + [B]3Jn_r  Vd3  + . . . + [B\rJn_r  Vdr) 

With  repeated  applications  of  several  of  the  properties  of  Theorem  VSPCV  we  can 
rearrange  this  expression  as, 

= (al  + Pi  [B]ij1  + P2  [B]ij2  + P3  [^]ij3  + ■ • ■ + Pn-r  [B)1fri_r'Sj  vdl  + 

(a2  + pl  [B]2  f1  + P‘2  [B]2J2  + P3  [B]2j3  + . . . + Pn-r  [B]2  fn_^j  Vd2  + 
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(«r  + /3i  [B]r)/i  + lB]r,h  + ^3  [-B]rj3  + ■ • ■ + Pn-r  [-®]r,/„_r)  V<L 

This  mess  expresses  the  vector  w as  a linear  combination  of  the  vectors  in 

T = {vdl,  vd2,  vd3,  . . . vdr} 

thus  saying  that  w £ ( T ).  Therefore,  W = (S)  C (T).  ■ 


In  Example  COV,  we  tossed-out  vectors  one  at  a time.  But  in  each  instance, 
we  rewrote  the  offending  vector  as  a linear  combination  of  those  vectors  with  the 
column  indices  of  the  pivot  columns  of  the  reduced  row-echelon  form  of  the  matrix 
of  columns.  In  the  proof  of  Theorem  BS,  we  accomplish  this  reduction  in  one  big 
step.  In  Example  COV  we  arrived  at  a linearly  independent  set  at  exactly  the  same 
moment  that  we  ran  out  of  free  variables  to  exploit.  This  was  not  a coincidence,  it 
is  the  substance  of  our  conclusion  of  linear  independence  in  Theorem  BS. 

Here  is  a straightforward  application  of  Theorem  BS. 

Example  RSC4  Reducing  a span  in  C4 
Begin  with  a set  of  five  vectors  from  C4, 


and  let  W = ( S ).  To  arrive  at  a (smaller)  linearly  independent  set,  follow  the 
procedure  described  in  Theorem  BS.  Place  the  vectors  from  S into  a matrix  as 
columns,  and  row-reduce, 


IT 

2 

2 

7 

01 

[H 

2 

0 

1 

2 " 

1 

2 

0 

1 

2 

RREF 

0 

0 

0 

3 

-1 

2 

4 

-1 

-1 

5 

0 

0 

0 

0 

0 

|_1 

2 

1 

4 

lj 

0 

0 

0 

0 

0 

Columns  1 and  3 are  the  pivot  columns  (D  = {1,  3}) 


r 

T 

- 2 ' 

) 

1 

1 

0 

1 

2 

1 

-1 

i 

1 

1 

1 

so  the  set 


is  linearly  independent  and  (T)  = (S)  = W.  Boom! 

Since  the  reduced  row-echelon  form  of  a matrix  is  unique  (Theorem  RREFU), 
the  procedure  of  Theorem  BS  leads  us  to  a unique  set  T.  However,  there  is  a wide 
variety  of  possibilities  for  sets  T that  are  linearly  independent  and  which  can  be 
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employed  in  a span  to  create  W.  Without  proof,  we  list  two  other  possibilities: 


Can  you  prove  that  T'  and  T*  are  linearly  independent 
(T')  = (T*)? 


sets  and  W = ( S ) = 
A 


Example  RES  Reworking  elements  of  a span 
Begin  with  a set  of  five  vectors  from  C4, 


It  is  easy  to  create  elements  of  X = ( R ) — we  will  create  one  at  random, 


1-21 

r-ii 

r-si 

r 3 1 

r-ioi 

r 9 1 

y = 6 

1 

+ (-7) 

l 

+ i 

+ 6 

i 

+ 2 

-i 

2 

3 

0 

-9 

-l 

-l 

- 

1 

2 

i 

-4 

-2 

4 

-3 

We  know  we  can  replace  I?  by  a smaller  set  (since  it  is  obviously  linearly  dependent 
by  Theorem  MYSLD)  that  will  create  the  same  span.  Here  goes, 


r2 

-1 

-8  3 

-101 

[S 

0 

-3 

0 

-1' 

i 

1 

-1  1 

-1 

RREF 

0 

0 

2 

0 

2 

3 

0 

-9  -1 

-1 

0 

0 

0 

nn  -2 

1.2 

1 

-4  —2 

4 J 

. 0 

0 

0 

0 

0 . 

So,  if  we  collect  the  first,  second  and  fourth  vectors  from  R , 


P = 


-r 

1 

0 

1 


' 3 ' 
1 

-1 

-2 


then  P is  linearly  independent  and  ( P ) = (R)  = X by  Theorem  BS.  Since  we  built 
y as  an  element  of  (R)  it  must  also  be  an  element  of  (P) . Can  we  write  y as  a linear 
combination  of  just  the  three  vectors  in  PI  The  answer  is,  of  course,  yes.  But  let  us 
compute  an  explicit  linear  combination  just  for  fun.  By  Theorem  SLSLC  we  can  get 
such  a linear  combination  by  solving  a system  of  equations  with  the  column  vectors 
of  R as  the  columns  of  a coefficient  matrix,  and  y as  the  vector  of  constants. 
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Employing  an  augmented  matrix  to  solve  this  system, 


'2  -1  3 

1 1 1 
3 0-1 

2 1 -2 


9 ‘ 
2 
1 

-3 


RREF 
S> 


rs 

0 

0 

0 


0 0 
0 0 
0 0 
0 0 


So  we  see,  as  expected,  that 


'2' 

"— 1" 

' 3 ' 

' 9 ‘ 

1 

+ (-D 

1 

+ 2 

1 

2 

3 

0 

-1 

- 

1 

2 

1 

-2 

-3 

1 

-1 

2 

0 


y 


A key  feature  of  this  example  is  that  the  linear  combination  that  expresses  y as  a 
linear  combination  of  the  vectors  in  P is  unique.  This  is  a consequence  of  the  linear 
independence  of  P.  The  linearly  independent  set  P is  smaller  than  i?,  but  still  just 
(barely)  big  enough  to  create  elements  of  the  set  X = ( R ).  There  are  many,  many 
ways  to  write  y as  a linear  combination  of  the  five  vectors  in  R (the  appropriate 
system  of  equations  to  verify  this  claim  yields  two  free  variables  in  the  description 
of  the  solution  set),  yet  there  is  precisely  one  way  to  write  y as  a linear  combination 
of  the  three  vectors  in  P.  A 


Reading  Questions 


1.  Let  S be  the  linearly  dependent  set  of  three  vectors  below. 


Write  one  vector  from  S as  a linear  combination  of  the  other  two  and  include  this 
vector  equality  in  your  response.  (You  should  be  able  to  do  this  on  sight,  rather  than 
doing  some  computations.)  Convert  this  expression  into  a nontrivial  relation  of  linear 
dependence  on  S. 

2.  Explain  why  the  word  “dependent”  is  used  in  the  definition  of  linear  dependence. 

3.  Suppose  that  Y = ( P ) = (Q),  where  P is  a linearly  dependent  set  and  Q is  linearly 
independent.  Would  you  rather  use  P or  Q to  describe  Y?  Why? 

Exercises 

C20'  Let  T be  the  set  of  columns  of  the  matrix  B below.  Define  W = ( T }.  Find  a set  R 

so  that  (1)  R has  3 vectors,  (2)  R is  a subset  of  T,  and  (3)  W = (R). 

-3  1-2  7 ' 

-12  1 4 

112-1 


B = 
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C40  Verify  that  the  set  R'  = {vi,  v2,  V4}  at  the  end  of  Example  RSC5  is  linearly 
independent. 

C50'  Consider  the  set  of  vectors  from  C3,  W,  given  below.  Find  a linearly  independent 
set  T that  contains  three  vectors  from  W and  such  that  ( W ) = (T). 


W = {vi,  v2,  v3,  v4,  v5}  = 


C51'  Given  the  set  S below,  find  a linearly  independent  set  T so  that  (T)  = { S ). 


f 

2 

3 

1 

5 

I 

s=\ 

-1 

0 

1 

-1 

\ 

2 

1 

-1 

3 

1 

C52'  Let  W be  the  span  of  the  set  of  vectors  S below,  W = { S }.  Find  a set  T so  that  1) 
the  span  of  T is  W,  (T)  = W,  (2)  T is  a linearly  independent  set,  and  (3)  T is  a subset  of 


f 

1 

2 

4 

3 

3 

1 

-v  = 

2 

-3 

1 

1 

-1 

l 

-1 

1 

-1 

1 

0 

I 

C55^  Let  T be  the  set  of  vectors  T 


Find  two  different 


subsets  of  T,  named  R and  S,  so  that  R and  S each  contain  three  vectors,  and  so  that 
(R)  = (T)  and  (S)  = (T).  Prove  that  both  R and  S are  linearly  independent. 


C70  Reprise  Example  RES  by  creating  a new  version  of  the  vector  y.  In  other  words, 
form  a new,  different  linear  combination  of  the  vectors  in  R to  create  a new  vector  y (but 
do  not  simplify  the  problem  too  much  by  choosing  any  of  the  five  new  scalars  to  be  zero). 
Then  express  this  new  y as  a combination  of  the  vectors  in  P. 

M10  At  the  conclusion  of  Example  RSC4  two  alternative  solutions,  sets  T'  and  T* , are 
proposed.  Verify  these  claims  by  proving  that  (T)  = ( T ')  and  (T)  = (T*). 

T4(F  Suppose  that  vi  and  v2  are  any  two  vectors  from  Cm.  Prove  the  following  set 
equality. 


({vi,  v2})  = ({vi  + v2,  vi  - v2}} 


Section  O 
Ort  hogonality 


In  this  section  we  define  a couple  more  operations  with  vectors,  and  prove  a few 
theorems.  At  first  blush  these  definitions  and  results  will  not  appear  central  to  what 
follows,  but  we  will  make  use  of  them  at  key  points  in  the  remainder  of  the  course 
(such  as  Section  MINM,  Section  OD).  Because  we  have  chosen  to  use  C as  our  set  of 
scalars,  this  subsection  is  a bit  more,  uh,  . . . complex  than  it  would  be  for  the  real 
numbers.  We  will  explain  as  we  go  along  how  things  get  easier  for  the  real  numbers 
R.  If  you  have  not  already,  now  would  be  a good  time  to  review  some  of  the  basic 
properties  of  arithmetic  with  complex  numbers  described  in  Section  CNO.  With 
that  done,  we  can  extend  the  basics  of  complex  number  arithmetic  to  our  study  of 
vectors  in  Cm. 


Subsection  CAV 

Complex  Arithmetic  and  Vectors 

We  know  how  the  addition  and  multiplication  of  complex  numbers  is  employed  in 
defining  the  operations  for  vectors  in  Cm  (Definition  CVA  and  Definition  CVSM). 
We  can  also  extend  the  idea  of  the  conjugate  to  vectors. 

Definition  CCCV  Complex  Conjugate  of  a Column  Vector 

Suppose  that  u is  a vector  from  Cm.  Then  the  conjugate  of  the  vector,  u,  is  defined 

by 


With  this  definition  we  can  show  that  the  conjugate  of  a column  vector  behaves 
as  we  would  expect  with  regard  to  vector  addition  and  scalar  multiplication. 

Theorem  CRVA  Conjugation  Respects  Vector  Addition 
Suppose  x and  y are  two  vectors  from  Cm.  Then 

x + y = x + y 


Proof.  For  each  1 < i < m, 


lx  + yjj  = [X  + y]4 

Definition  CCCV 

= Mi  + Mi 

Definition  CVA 

= Mi  + Mi 

Theorem  CCRA 

= Mi  + Mi 

Definition  CCCV 

= [x  + y]. 

Definition  CVA 
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Then  by  Definition  CVE  we  have 


y = x + y. 


Theorem  CRSM  Conjugation  Respects  Vector  Scalar  Multiplication 
Suppose  x is  a vector  from  Cm , and  a £ C is  a scalar.  Then 

a x = ax 


Proof.  For  1 < i < m, 


“li  = Mi 

Definition  CCCV 

= « Mi 

Definition  CVSM 

= a Mi 

Theorem  CCRM 

= a [x]i 

Definition  CCCV 

= [«*]i 

Definition  CVSM 

Then  by  Definition  CVE  we  have  ax  = ax.  ■ 

These  two  theorems  together  tell  us  how  we  can  “push”  complex  conjugation 
through  linear  combinations. 

Subsection  IP 
Inner  products 

Definition  IP  Inner  Product 

Given  the  vectors  u,  v £ Cm  the  inner  product  of  u and  v is  the  scalar  quantity 
in  C, 

m 

(u,  v)  = [u]1  [v]3  + [u]2  [v]2  + [u]3  [v]3  + • • • + [u]m  [v]m  = Mi  Mi 

1=1 

□ 

This  operation  is  a bit  different  in  that  we  begin  with  two  vectors  but  produce  a 
scalar.  Computing  one  is  straightforward. 

Example  CSIP  Computing  some  inner  products 
The  inner  product  of 


2 + 3i 

'1  + 2 i- 

u = 

5 + 2 i 

and 

V = 

— 4 + 5 i 

.—3  + i_ 

.0  + 5 i _ 

is 


(u,  v)  = (2  + 3i)(l  + 2 i)  + (5  + 2*)(-4  + 5i)  + (-3  + i){ 0 + 5 i) 
= (2  - 3*)(1  + 2 i)  + (5  - 2i)(-4  + 5i)  + (-3  - *)(0  + 5 i) 
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= (8  + i)  + (-10  + 33*)  + (5  - 15*) 

= 3 + 19z 

The  inner  product  of 


' 2 ' 

' 3 ' 

4 

1 

-3 

and  x = 

0 

2 

-1 

8 

-2 

is 

(w,  x)  = (2)3  + (4)1  + (—3)0  + (2)(— 1)  + (8) (—2) 

= 2(3)  + 4(1)  + (-3)0  + 2(— 1)  + 8(— 2)  = -8. 

A 

In  the  case  where  the  entries  of  our  vectors  are  all  real  numbers  (as  in  the  second 
part  of  Example  CSIP),  the  computation  of  the  inner  product  may  look  familiar  and 
be  known  to  you  as  a dot  product  or  scalar  product.  So  you  can  view  the  inner 
product  as  a generalization  of  the  scalar  product  to  vectors  from  Cm  (rather  than 
Rm). 

Note  that  we  have  chosen  to  conjugate  the  entries  of  the  first  vector  listed  in  the 
inner  product,  while  it  is  almost  equally  feasible  to  conjugate  entries  from  the  second 
vector  instead.  In  particular,  prior  to  Version  2.90,  we  did  use  the  latter  definition, 
and  this  has  now  changed  to  the  former,  with  resulting  adjustments  propogated 
up  through  Section  CB  (only).  However,  conjugating  the  first  vector  leads  to  much 
nicer  formulas  for  certain  matrix  decompositions  and  also  shortens  some  proofs. 

There  are  several  quick  theorems  we  can  now  prove,  and  they  will  each  be  useful 
later. 

Theorem  IPVA  Inner  Product  and  Vector  Addition 
Suppose  u,  v,  w € Cm.  Then 

1.  (u  + v,  w)  = (u,  w)  + (v,  w) 

2.  (u,  v + w)  = (u,  v)  + (u,  w) 

Proof.  The  proofs  of  the  two  parts  are  very  similar,  with  the  second  one  requiring 
just  a bit  more  effort  due  to  the  conjugation  that  occurs.  We  will  prove  part  1 and 
you  can  prove  part  2 (Exercise  O.T10). 

m 

(u  + v,  w)  = [u  + v]^  [w]^  Definition  IP 

2=1 
m 

= (Mi  + Mi)  Ml* 

2=1 


Definition  CVA 
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m 

= (Mi  + Mi)  Ml* 

2=1 

m 

= Mi  Ml*  + Mi  [Mi 

2—1 

m m 

= J2  Mi  [Mi  + J2  Mi  [Mi 

2=1  2=1 

= <U,  w)  + (v,  w) 


Theorem  CCRA 


Property  DCN 


Property  CACN 
Definition  IP 


Theorem  IPSM  Inner  Product  and  Scalar  Multiplication 
Suppose  u,  v G Cm  and  a € C.  Then 

1.  (cm,  v)  = a (u,  v) 

2.  (u,  qv)  = a (u,  v) 


Proof.  The  proofs  of  the  two  parts  are  very  similar,  with  the  second  one  requiring 
just  a bit  more  effort  due  to  the  conjugation  that  occurs.  We  will  prove  part  1 and 
you  can  prove  part  2 (Exercise  O.T11). 


m 

(“U,  V>  = £ [au],  [v]4 
2=1 
m 

= 53aMiMi 
2=1 
m 

= “Mi  Mi 
2=1 
m 

= a13MiMi 

2=1 

= a (u,  v) 


Definition  IP 

Definition  CVSM 

Theorem  CCRM 

Property  DCN 
Definition  IP 


Theorem  IPAC  Inner  Product  is  Anti-Commutative 
Suppose  that  u and  v are  vectors  in  Cm.  Then  (u,  v)  = (v,  u). 

Proof. 

m 

(u,  v)  = j:  h Mi 

2=1 


Definition  IP 
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m 


J2  Mi  Mi 

2=1 

Theorem  CCT 

m 

Mi  Mi 

2=1 

Theorem  CCRM 

|hh) 

Theorem  CCRA 

(§“■') 

Property  CMCN 

(v,  u) 

Definition  IP 

Subsection  N 
Norm 


If  treating  linear  algebra  in  a more  geometric  fashion,  the  length  of  a vector  occurs 
naturally,  and  is  what  you  would  expect  from  its  name.  With  complex  numbers,  we 
will  define  a similar  function.  Recall  that  if  c is  a complex  number,  then  |c|  denotes 
its  modulus  (Definition  MCN). 

Definition  NV  Norm  of  a Vector 

The  norm  of  the  vector  u is  the  scalar  quantity  in  C 


□ 


Computing  a norm  is  also  easy  to  do. 


Example  CNSV  Computing  the  norm  of  some  vectors 
The  norm  of 


u = 


3 T 2 i 
1 — 6* 
2 + 4 * 
2 + i 


is 


||u||  = ^13  + 2 i\2  + |1  - Qi\2  + |2  + 4* | 2 + |2  + i\2 
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= V13  + 37  + 20  + 5 = y/75  = 5^3 


The  norm  of 


v = 


r 3 1 


-i 

2 

4 

-3 


is 

||v||  = ^|3|2  + |-1|2  + |2|2  + |4|2  + |-3|2  = ^32  + p + 22  + 42  + 32  = V39. 

A 


Notice  how  the  norm  of  a vector  with  real  number  entries  is  just  the  length  of 
the  vector.  Inner  products  and  norms  are  related  by  the  following  theorem. 

Theorem  IPN  Inner  Products  and  Norms 
Suppose  that  u is  a vector  in  Cm.  Then  1 1 u 1 1 2 = (u,  u). 


Proof. 


u 


En*i2 


E H H 

2=1 


(u,  u) 


Definition  NV 


Inverse  functions 


Definition  MCN 
Definition  IP 


When  our  vectors  have  entries  only  from  the  real  numbers  Theorem  IPN  says 
that  the  dot  product  of  a vector  with  itself  is  equal  to  the  length  of  the  vector 
squared. 

Theorem  PIP  Positive  Inner  Products 

Suppose  that  u is  a vector  in  Cm.  Then  (u,  u)  > 0 with  equality  if  and  only  if  u = 0. 
Proof.  From  the  proof  of  Theorem  IPN  we  see  that 


Since  each  modulus  is  squared,  every  term  is  positive,  and  the  sum  must  also  be 
positive.  (Notice  that  in  general  the  inner  product  is  a complex  number  and  cannot 
be  compared  with  zero,  but  in  the  special  case  of  (u,  u)  the  result  is  a real  number.) 
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The  phrase,  “with  equality  if  and  only  if”  means  that  we  want  to  show  that 
the  statement  (u,  u)  = 0 (i.e.  with  equality)  is  equivalent  (“if  and  only  if”)  to  the 
statement  u = 0. 

If  u = 0,  then  it  is  a straightforward  computation  to  see  that  (u,  u)  = 0.  In  the 
other  direction,  assume  that  (u,  u)  = 0.  As  before,  (u,  u)  is  a sum  of  moduli.  So  we 
have 

0 = (u,  u)  = \[u]1\a  + |[u]2|2  + |[u]3|2  + • • • + |[u]J2 

Now  we  have  a sum  of  squares  equaling  zero,  so  each  term  must  be  zero.  Then 
by  similar  logic,  |[u]J  = 0 will  imply  that  [u]^  = 0,  since  0 + 0*  is  the  only  complex 
number  with  zero  modulus.  Thus  every  entry  of  u is  zero  and  so  u = 0,  as  desired. ■ 

Notice  that  Theorem  PIP  contains  three  implications: 

u G Cm  =>  (u,  u)  > 0 
u = 0 =>  (u,  u)  = 0 
(u,  u)  = 0 =>■  u = 0 

The  results  contained  in  Theorem  PIP  are  summarized  by  saying  “the  inner 
product  is  positive  definite.” 

Subsection  OV 
Orthogonal  Vectors 

“Orthogonal”  is  a generalization  of  “perpendicular.”  You  may  have  used  mutually 
perpendicular  vectors  in  a physics  class,  or  you  may  recall  from  a calculus  class  that 
perpendicular  vectors  have  a zero  dot  product.  We  will  now  extend  these  ideas  into 
the  realm  of  higher  dimensions  and  complex  scalars. 

Definition  OV  Orthogonal  Vectors 

A pair  of  vectors,  u and  v,  from  Cm  are  orthogonal  if  their  inner  product  is  zero, 
that  is,  (u,  v)  = 0.  □ 

Example  TOV  Two  orthogonal  vectors 
The  vectors 


2 + 3* 

' 1 — i “ 

4 — 2* 

2 — |—  3 i 

u = 

1 + * 

V = 

4 — 6i 

.1+*. 

1 

are  orthogonal  since 

(u,  v)  = (2  - 3*)(1  - i)  + (4  + 2*)(2  + 3 i)  + (1  - *)( 4 - 6 1)  + (1  - i)(l) 
= (-1  - 5 i)  + (2  + 16 i)  + (-2  - 1(H)  + (1  - i) 

= 0 + 0 i. 
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We  extend  this  definition  to  whole  sets  by  requiring  vectors  to  be  pairwise 
orthogonal.  Despite  using  the  same  word,  careful  thought  about  what  objects  you 
are  using  will  eliminate  any  source  of  confusion. 

Definition  OSV  Orthogonal  Set  of  Vectors 

Suppose  that  S = {ui,  u2,  u3,  . . . , u„}  is  a set  of  vectors  from  Cm.  Then  S is 
an  orthogonal  set  if  every  pair  of  different  vectors  from  S is  orthogonal,  that  is 
(uj,  Uj)  =0  whenever  i ^ j.  □ 

We  now  define  the  prototypical  orthogonal  set,  which  we  will  reference  repeatedly. 


Definition  SUV  Standard  Unit  Vectors 

Let  ej  G Cm,  1 < j <m  denote  the  column  vectors  defined  by 


Then  the  set 


if  * ± j 
if  i = j 


{ei,  e2,  e3,  . . . , eTO}  = { 1 1 < j < m} 

is  the  set  of  standard  unit  vectors  in  Cm. 


□ 


Notice  that  e3  is  identical  to  column  j of  the  mxm  identity  matrix  Im  (Definition 
IM)  and  is  a pivot  column  for  Jm,  since  the  identity  matrix  is  in  reduced  row-echelon 
form.  These  observations  will  often  be  useful.  We  will  reserve  the  notation  e?;  for  these 
vectors.  It  is  not  hard  to  see  that  the  set  of  standard  unit  vectors  is  an  orthogonal 
set. 


Example  SUVOS  Standard  Unit  Vectors  are  an  Orthogonal  Set 

Compute  the  inner  product  of  two  distinct  vectors  from  the  set  of  standard  unit 

vectors  (Definition  SUV),  say  e*,  e,-,  where  i ^ j , 

(e*,  ej)  = 00  + 00  + • • • + TO  + • • • + 00  + • • • + 01  + • ■ ■ + 00  + 00 
= 0(0)  + 0(0)  + ■ • • + 1(0)  + • • • + 0(1)  + • • • + 0(0)  + 0(0) 

= 0 


So  the  set  {e1;  e2,  e3,  . . . , em}  is  an  orthogonal  set. 


Example  AOS  An  orthogonal  set 
The  set 


{xi,  x2 , x3 , x4}  = 


"1  + 1 

'1  + 5*' 

1 

6 + 5* 

1 — 2 

5 

-7-i 

2 

1 — 6* 

‘ -7  + 34*  ‘ 
-8  - 23 i 
-10  + 22  i 
30  + 13* 


'-2  - 4 *' 

) 

6 + * 

1 

4 + 3* 

6 — * 

1 

A 


is  an  orthogonal  set. 

Since  the  inner  product  is  anti-commutative  (Theorem  IPAC)  we  can  test  pairs 
of  different  vectors  in  any  order.  If  the  result  is  zero,  then  it  will  also  be  zero  if  the 
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inner  product  is  computed  in  the  opposite  order.  This  means  there  are  six  different 
pairs  of  vectors  to  use  in  an  inner  product  computation.  We  will  do  two  and  you 
can  practice  your  inner  products  on  the  other  four. 

(x1}  x3)  = (1  - *)(- 7 + 34*)  + (1)(— 8 - 23 *)  + (1  + i)(— 10  + 22*)  + (-*)(30  + 13*) 
= (27  + 41*)  + (-8  - 23*)  + (-32  + 12*)  + (13  - 30*) 

= 0 + 0* 

and 

(x2,  x4)  = (1  — 5*)(— 2 — 4*)  + (6  - 5*)  (6  + *)  + (—7  + *)(4  + 3*)  + (1  + 6*)(6  — *) 

= (-22  + 6i)  + (41  - 24*)  + (-31  - 17*)  + (12  + 35*) 

= 0 + 0* 

So  far,  this  section  has  seen  lots  of  definitions,  and  lots  of  theorems  establishing 
un-surprising  consequences  of  those  definitions.  But  here  is  our  first  theorem  that 
suggests  that  inner  products  and  orthogonal  vectors  have  some  utility.  It  is  also  one 
of  our  first  illustrations  of  how  to  arrive  at  linear  independence  as  the  conclusion  of 
a theorem. 

Theorem  OSLI  Orthogonal  Sets  are  Linearly  Independent 

Suppose  that  S is  an  orthogonal  set  of  nonzero  vectors.  Then  S is  linearly  independent. 


Proof.  Let  S = {ui,  u2,  u3,  . . . , u„}  be  an  orthogonal  set  of  nonzero  vectors.  To 
prove  the  linear  independence  of  S,  we  can  appeal  to  the  definition  (Definition  LICV) 
and  begin  with  an  arbitrary  relation  of  linear  dependence  (Definition  RLDCV), 

OR  Ui  + a 2U2  + U3U3  + ’ ’ • + OLn  u„  = 0. 

Then,  for  every  1 < * < n,  we  have 
OLi  (Uj,  u,) 

= aq(0)  + a2(0)  + • • • + oti  (uj,  + • • • + an( 0) 

= ax  (Uj,  u4)  H b oti  (uj,  ^)  4 b an  (ui;  un) 

= (uj,  oqUi)  + (uj,  a2u2)  H b (Ui,  a„u„) 

= (ui,  aiU!  + a2u2  + a3u3  H b anun) 

= <Ui,  0) 

= 0 


Property  ZCN 
Definition  OSV 
Theorem  IPSM 
Theorem  IPVA 
Definition  RLDCV 
Definition  IP 


Because  Ui  was  assumed  to  be  nonzero,  Theorem  PIP  says  (ui,  Uj)  is  nonzero 
and  thus  cti  must  be  zero.  So  we  conclude  that  at  = 0 for  all  1 < * < n in  any 
relation  of  linear  dependence  on  S.  But  this  says  that  S'  is  a linearly  independent 
set  since  the  only  way  to  form  a relation  of  linear  dependence  is  the  trivial  way 
(Definition  LICV).  Boom!  ■ 
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Subsection  GSP 
Gram-Schmidt  Procedure 


The  Gram-Schmidt  Procedure  is  really  a theorem.  It  says  that  if  we  begin  with  a 
linearly  independent  set  of  p vectors,  S,  then  we  can  do  a number  of  calculations 
with  these  vectors  and  produce  an  orthogonal  set  of  p vectors,  T,  so  that  (S)  = ( T ). 
Given  the  large  number  of  computations  involved,  it  is  indeed  a procedure  to  do  all 
the  necessary  computations,  and  it  is  best  employed  on  a computer.  However,  it  also 
has  value  in  proofs  where  we  may  on  occasion  wish  to  replace  a linearly  independent 
set  by  an  orthogonal  set. 

This  is  our  first  occasion  to  use  the  technique  of  “mathematical  induction”  for  a 
proof,  a technique  we  will  see  again  several  times,  especially  in  Chapter  D.  So  study 
the  simple  example  described  in  Proof  Technique  I first. 

Theorem  GSP  Gram-Schmidt  Procedure 

Suppose  that  5 = {v1;  v2,  v 3,  . . . , vp}  is  a linearly  independent  set  of  vectors  in 
Cm.  Define  the  vectors  rq,  1 < i < p by 

(Ul,  Vi)  (u2 , Vi)  (U3,  Vj)  (Ui_l,  Vi) 

(Ul,  Ul)  (u2 , U2)  (u3,  U3)  (Ui_l,  Uj_l) 

Let  T = {ui,  u2,  u3,  . . . , Up}.  Then  T is  an  orthogonal  set  of  nonzero  vectors, 
and  (T)  = (S). 


Proof.  We  will  prove  the  result  by  using  induction  on  p (Proof  Technique  I).  To 
begin,  we  prove  that  T has  the  desired  properties  when  p = 1.  In  this  case  u3  = Vi 
and  T = {ui}  = {vi}  = S.  Because  S and  T are  equal,  (S)  = ( T ).  Equally  trivial, 
T is  an  orthogonal  set.  If  u3  = 0,  then  S would  be  a linearly  dependent  set,  a 
contradiction. 

Suppose  that  the  theorem  is  true  for  any  set  of  p — 1 linearly  independent 
vectors.  Let  S = {vi,  v2,  v3,  . . . , vp}  be  a linearly  independent  set  of  p vectors. 
Then  S'  = {vi,  v2,  v3,  . . . , vp_i}  is  also  linearly  independent.  So  we  can  apply 
the  theorem  to  S'  and  construct  the  vectors  T'  = {ui,  u2,  u3,  . . . , up_i}.  V is 
therefore  an  orthogonal  set  of  nonzero  vectors  and  (S')  = (T').  Define 


Up  — Vp 


(Ul,  vp)  (u2,  Vp)  (u3,  Vp) 

(Ul,  Ui)  (u2,  U2)  (u3,  U3) 


(w*— 15  Vp) 

7 rUp— 1 

(Up_l,  Up_i) 


and  let  T = V U {up}.  We  need  to  now  show  that  T has  several  properties  by 
building  on  what  we  know  about  T' . But  first  notice  that  the  above  equation  has 
no  problems  with  the  denominators  ((uj,  u*))  being  zero,  since  the  u,  are  from  T' , 
which  is  composed  of  nonzero  vectors. 

We  show  that  (T)  = (S),  by  first  establishing  that  (T)  C (S).  Suppose  x e (T), 


x = aiUi  + a2u2  + a3u3  H + apup 


so 
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The  term  apup  is  a linear  combination  of  vectors  from  T'  and  the  vector  vp,  while  the 
remaining  terms  are  a linear  combination  of  vectors  from  T' . Since  (T')  = ( S '),  any 
term  that  is  a multiple  of  a vector  from  T'  can  be  rewritten  as  a linear  combination 
of  vectors  from  S' . The  remaining  term  apvp  is  a multiple  of  a vector  in  S.  So  we 
see  that  x can  be  rewritten  as  a linear  combination  of  vectors  from  S,  i.e.  x £ ( S ). 

To  show  that  (S)  C (T) , begin  with  y £ (S) , so 


y = aivi  + a2v2  + a3v3  H b apvp 


Rearrange  our  defining  equation  for  up  by  solving  for  vp.  Then  the  term  apvp 
is  a multiple  of  a linear  combination  of  elements  of  T.  The  remaining  terms  are  a 
linear  combination  of  Vi,  v2,  v3,  . . . , vp_i,  hence  an  element  of  (S')  = ( T ').  Thus 
these  remaining  terms  can  be  written  as  a linear  combination  of  the  vectors  in  T1 . 
So  y is  a linear  combination  of  vectors  from  T,  i.e.  y £ ( T ) . 

The  elements  of  T'  are  nonzero,  but  what  about  up?  Suppose  to  the  contrary 
that  up  = 0, 


0 = Up  = vp  - 


Ui,  V. 


p/ 


Ul,  Ui 


Ui  - 


U2,  V, 


p/ 


(Ul,  V. 


p/ 


(Ur,  Ul) 


Ul 


U2,  V. 


Pt 


(u2,  u2) 


(U2 


U2 


u2) 

(u3 


u2  - 


(U3,  Vp), 
Iu3,  u3) 


u3 


(Up— 1,  Vp) 


Up— 1,  Up—i) 


Up-1 


p) 


U3  + ■ 


(Up— 1,  Vp) 


-Up-1 


(u3,  U3)  - (Up—!,  Up_i) 

Since  (S')  = (T')  we  can  write  the  vectors  Ui,  u2,  u3,  . . . , up_!  on  the  right  side 
of  this  equation  in  terms  of  the  vectors  Vi,  v2,  v3,  . . . , vp_i  and  we  then  have  the 
vector  Vp  expressed  as  a linear  combination  of  the  other  p — 1 vectors  in  S , implying 
that  S’  is  a linearly  dependent  set  (Theorem  DLDS),  contrary  to  our  lone  hypothesis 
about  S. 

Finally,  it  is  a simple  matter  to  establish  that  T is  an  orthogonal  set,  though  it 
will  not  appear  so  simple  looking.  Think  about  your  objects  as  you  work  through 
the  following  — what  is  a vector  and  what  is  a scalar.  Since  T'  is  an  orthogonal  set 
by  induction,  most  pairs  of  elements  in  T are  already  known  to  be  orthogonal.  We 
just  need  to  test  “new”  inner  products,  between  up  and  u,,  for  1 < i < p — 1.  Here 
we  go,  using  summation  notation, 

/ — 1 (ufc,  Vp) 


(Ui,  u„>  = 


Theorem  IPVA 


Theorem  IPVA 
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= <Ui,  Vp) 


= <Ui,  Vp) 


= (Ui,  Vp) 
= 0 


Y ^Ufc’  (u;,  Ufc)  Theorem  IPSM 

<«*.  ufc> 

(U''  Vpl  (u uj  — y jUfc:  vp)  (q)  Induction  Hypothesis 
(w  u*)  k^i  (Ufe’  Ufc) 

(u»,  vp)  - 51° 

k^i 


Example  GSTV  Gram-Schmidt  of  three  vectors 

We  will  illustrate  the  Gram-Schmidt  process  with  three  vectors.  Begin  with  the 
linearly  independent  (check  this!)  set 


Then 


S = {vi,  v2,  v3}  = 


' 1 ' 

1 + * 
1 


Ul  = Vi 


1 ' 

1 + * 
1 


u2  = v2 


(ui,  v2)  _ 1 

(ui,  Ui)Ul  4 


-2  - 3 i 
1 — * 

_ 2 + 5*  _ 


and 


U3  = v3  - 


(Ul,  V3) 

/ \ ^1 
(Ul,  Ul) 


(U2,  V3)  = J_ 

(u2,u2)U2  11 


'-3  - 1 
1 + 3* 
— 1 — i 


T = {ui,  u2,  u3 } 


1 ' 

1 + * 

1 


1 


’ 4 


'-2  - 3 1 
1 — i 
.2  + 5 * . 


1 

11 


‘—3  — *1  1 
1 + 3*  \ 

-1  - i\  J 


is  an  orthogonal  set  (which  you  can  check)  of  nonzero  vectors  and  (T)  = ( S ) (all 
by  Theorem  GSP).  Of  course,  as  a by-product  of  orthogonality,  the  set  T is  also 
linearly  independent  (Theorem  OSLI).  A 

One  final  definition  related  to  orthogonal  vectors. 


Definition  ONS  OrthoNormal  Set 

Suppose  S = {u1;  u2,  u3,  . . . , u„}  is  an  orthogonal  set  of  vectors  such  that  | u,  1 1 = 1 
for  all  1 < * < n.  Then  S is  an  orthonormal  set  of  vectors.  □ 


Once  you  have  an  orthogonal  set,  it  is  easy  to  convert  it  to  an  orthonormal  set 
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— multiply  each  vector  by  the  reciprocal  of  its  norm,  and  the  resulting  vector  will 
have  norm  1.  This  scaling  of  each  vector  will  not  affect  the  orthogonality  properties 
(apply  Theorem  IPSM). 


Example  ONTV  Orthonormal  set,  three  vectors 
The  set 


T = {ui,  u2,  u3}  = 


1 ' 

1 + i 

1 


1 

’ 4 


-2  - 3 1 

1 — i 

2 + 5 i 


1 

TT 


-3 
1 + 

-1 


— i 

3 i 

— i 


from  Example  GSTV  is  an  orthogonal  set. 
We  compute  the  norm  of  each  vector, 


|ui||=2 


|u2||  = ^Vn 


lu3||  = 


Converting  each  vector  to  a norm  of  1,  yields  an  orthonormal 

1 


\/2 

x/TT 

set, 


1 

Wl  = 2 


1 + i 

1 

1 1 


w2  = 


-2  - 3 i 

1 — i 

2 + 5?' 


2V11 


-2  - 3 % 

1 — i 

2 + 5 i 


w3  = 


vdi 


1 

'—3  — i 

i 

■—3  — i 

i + 3?; 

1 + 3? 

if 

1 — i_ 

^22 

1 — i_ 

A 


Example  ONFV  Orthonormal  set,  four  vectors 
As  an  exercise  convert  the  linearly  independent  set 


to  an  orthogonal  set  via  the  Gram-Schmidt  Process  (Theorem  GSP)  and  then  scale 
the  vectors  to  norm  1 to  create  an  orthonormal  set.  You  should  get  the  same  set  you 
would  if  you  scaled  the  orthogonal  set  of  Example  AOS  to  become  an  orthonormal 
set.  A 


We  will  see  orthonormal  sets  again  in  Subsection  MINM.UM.  They  are  intimately 
related  to  unitary  matrices  (Definition  UM)  through  Theorem  CUMOS.  Some  of 
the  utility  of  orthonormal  sets  is  captured  by  Theorem  COB  in  Subsection  B.OBC. 
Orthonormal  sets  appear  once  again  in  Section  OD  where  they  are  key  in  orthonormal 
diagonalization. 
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Reading  Questions 

1.  Is  the  set 


an  orthogonal  set?  Why? 

2.  What  is  the  distinction  between  an  orthogonal  set  and  an  orthonormal  set? 

3.  What  is  nice  about  the  output  of  the  Gram-Schmidt  process? 

Exercises 

C20  Complete  Example  AOS  by  verifying  that  the  four  remaining  inner  products  are 
zero. 

C21  Verify  that  the  set  T created  in  Example  GSTV  by  the  Gram-Schmidt  Procedure  is 
an  orthogonal  set. 

M60  Suppose  that  {u,  v,  w}  C Cn  is  an  orthonormal  set.  Prove  that  u + v is  not 
orthogonal  to  v + w. 

T10  Prove  part  2 of  the  conclusion  of  Theorem  IPVA. 

Til  Prove  part  2 of  the  conclusion  of  Theorem  IPSM. 

T2(F  Suppose  that  u,  v,  w G C™,  a,  /3  G C and  u is  orthogonal  to  both  v and  w.  Prove 
that  u is  orthogonal  to  av  + /3w. 

T30  Suppose  that  the  set  S in  the  hypothesis  of  Theorem  GSP  is  not  just  linearly 
independent,  but  is  also  orthogonal.  Prove  that  the  set  T created  by  the  Gram-Schmidt 
procedure  is  equal  to  S'.  (Note  that  we  are  getting  a stronger  conclusion  than  (T)  = (S)  — 
the  conclusion  is  that  T = S.)  In  other  words,  it  is  pointless  to  apply  the  Gram-Schmidt 
procedure  to  a set  that  is  already  orthogonal. 

T31  Suppose  that  the  set  S is  linearly  independent.  Apply  the  Gram-Schmidt  procedure 
(Theorem  GSP)  twice,  creating  first  the  linearly  independent  set  Ti  from  S,  and  then 
creating  T2  from  Ti.  As  a consequence  of  Exercise  O.T30,  prove  that  Ti  = T2.  In  other 
words,  it  is  pointless  to  apply  the  Gram-Schmidt  procedure  twice. 


Chapter  M 
Matrices 


We  have  made  frequent  use  of  matrices  for  solving  systems  of  equations,  and  we 
have  begun  to  investigate  a few  of  their  properties,  such  as  the  null  space  and 
nonsingularity.  In  this  chapter,  we  will  take  a more  systematic  approach  to  the  study 
of  matrices. 


Section  MO 
Matrix  Operations 

In  this  section  we  will  back  up  and  start  simple.  We  begin  with  a definition  of  a 
totally  general  set  of  matrices,  and  see  where  that  takes  us. 

Subsection  MEASM 

Matrix  Equality,  Addition,  Scalar  Multiplication 

Definition  VSM  Vector  Space  of  m x n Matrices 

The  vector  space  Mmn  is  the  set  of  all  m x n matrices  with  entries  from  the  set  of 
complex  numbers.  □ 

Just  as  we  made,  and  used,  a careful  definition  of  equality  for  column  vectors,  so 
too,  we  have  precise  definitions  for  matrices. 

Definition  ME  Matrix  Equality 

The  m x n matrices  A and  B are  equal,  written  A = B provided  [A]l3  = [B]tJ  for 
all  1 < i < to,  1 < j < n.  □ 

So  equality  of  matrices  translates  to  the  equality  of  complex  numbers,  on  an 
entry-by-entry  basis.  Notice  that  we  now  have  yet  another  definition  that  uses  the 
symbol  “=”  for  shorthand.  Whenever  a theorem  has  a conclusion  saying  two  matrices 
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are  equal  (think  about  your  objects),  we  will  consider  appealing  to  this  definition  as 
a way  of  formulating  the  top-level  structure  of  the  proof. 

We  will  now  define  two  operations  on  the  set  Mmn.  Again,  we  will  overload  a 
symbol  (“+’)  and  a convention  (juxtaposition  for  scalar  multiplication). 

Definition  MA  Matrix  Addition 

Given  the  to  x n matrices  A and  B , define  the  sum  of  A and  B as  an  to  x n matrix, 
written  A + B,  according  to 

\A  + Bhj  = iA]ij  + \B\ij  1 < i < m,  1 < j < n 

□ 


So  matrix  addition  takes  two  matrices  of  the  same  size  and  combines  them  (in  a 
natural  way!)  to  create  a new  matrix  of  the  same  size.  Perhaps  this  is  the  “obvious” 
thing  to  do,  but  it  does  not  relieve  us  from  the  obligation  to  state  it  carefully. 


Example  MA  Addition  of  two  matrices  in  M23 
If 


then 


B = 


6 2 
3 5 


-4 

2 


A + B 


2-3  4 

1 0 -7j  4 

2 + 6 -3  + 2 

1+3  0+5 


6 2-4 
3 5 2 

4+ (-4)' 
-7  + 2 


8 

4 


-1 

5 


0 

-5 


A 


Our  second  operation  takes  two  objects  of  different  types,  specifically  a number 
and  a matrix,  and  combines  them  to  create  another  matrix.  As  with  vectors,  in  this 
context  we  call  a number  a scalar  in  order  to  emphasize  that  it  is  not  a matrix. 


Definition  MSM  Matrix  Scalar  Multiplication 

Given  the  to  x n matrix  A and  the  scalar  a £ C,  the  scalar  multiple  of  A is  an 
to  x n matrix,  written  aA  and  defined  according  to 

[aA]^  = a [A]id  l<i<m,  l<j<n 

□ 


Notice  again  that  we  have  yet  another  kind  of  multiplication,  and  it  is  again  written 
putting  two  symbols  side-by-side.  Computationally,  scalar  matrix  multiplication  is 
very  easy. 

Example  MSM  Scalar  multiplication  in  M32 
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If 


A = 


- 2 

-3 

0 


8‘ 

5 

1 


and  a = 7,  then 


' 2 8‘ 

r 7(2) 

7(8)1 

' 14 

56' 

aA  = 7 

—3  5 

= 

7(-3) 

7(5) 

= 

-21 

35 

0 1 

L 7(0) 

7(1)J 

0 

7 

A 


Subsection  VSP 
Vector  Space  Properties 

With  definitions  of  matrix  addition  and  scalar  multiplication  we  can  now  state,  and 
prove,  several  properties  of  each  operation,  and  some  properties  that  involve  their 
interplay.  We  now  collect  ten  of  them  here  for  later  reference. 

Theorem  VSPM  Vector  Space  Properties  of  Matrices 

Suppose  that  Mmn  is  the  set  of  all  m x n matrices  (Definition  VSM)  with  addition 
and  scalar  multiplication  as  defined  in  Definition  MA  and  Definition  MSM.  Then 

• ACM  Additive  Closure,  Matrices 

If  A,  B £ Mmn,  then  A B £ AIrnn. 

• SCM  Scalar  Closure,  Matrices 

If  a £ C and  A £ Mmn,  then  a A £ Mmn. 

• CM  Commutativity,  Matrices 

If  A,  B £ Mmn,  then  A -\-  B — B -\-  A. 

• AAM  Additive  Associativity,  Matrices 

If  A,  B,C  £ Mmn,  then  A+ (B  + C)  = (A  + B)  + C. 

• ZM  Zero  Matrix,  Matrices 

There  is  a matrix,  O , called  the  zero  matrix,  such  that  A + O = A for  all 
A £ Mmn. 

• AIM  Additive  Inverses,  Matrices 

If  A £ Mmn,  then  there  exists  a matrix  — A £ Mmn  so  that  A + (—A)  = O . 

• SMAM  Scalar  Multiplication  Associativity,  Matrices 
If  a,  (1  £ C and  A £ Mmn,  then  a(fiA)  = (a/3)  A. 
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• DMAM  Distributivity  across  Matrix  Addition,  Matrices 
If  a £ C and  A,  B £ Mmn,  then  a(A  + B)  = aA  + aB . 

• DSAM  Distributivity  across  Scalar  Addition,  Matrices 
If  a,  ft  £ C and  A £ Mmn,  then  (a  + ft)  A = aA  + f}A. 

• OM  One,  Matrices 

If  A £ Mmn,  then  1 A = A. 


Proof.  While  some  of  these  properties  seem  very  obvious,  they  all  require  proof. 
However,  the  proofs  are  not  very  interesting,  and  border  on  tedious.  We  will  prove 
one  version  of  distributivity  very  carefully,  and  you  can  test  your  proof-building  skills 
on  some  of  the  others.  We  will  give  our  new  notation  for  matrix  entries  a workout 
here.  Compare  the  style  of  the  proofs  here  with  those  given  for  vectors  in  Theorem 
VSPCV  — while  the  objects  here  are  more  complicated,  our  notation  makes  the 
proofs  cleaner. 

To  prove  Property  DSAM,  (a  + /3)A  = aA  + (3A , we  need  to  establish  the  equality 
of  two  matrices  (see  Proof  Technique  GS).  Definition  ME  says  we  need  to  establish 
the  equality  of  their  entries,  one-by-one.  How  do  we  do  this,  when  we  do  not  even 
know  how  many  entries  the  two  matrices  might  have?  This  is  where  the  notation  for 
matrix  entries,  given  in  Definition  M,  comes  into  play.  Ready?  Here  we  go. 

For  any  i and  j,  1 < i < m,  1 < j < n, 


[(a  + /f)A]^.  = (a  + ft)  [A]i;j 

= a [Mij  + P [Mij 
= + lPA]ij 

= [aA  + /3A]y 


Definition  MSM 
Distributivity  in  C 
Definition  MSM 
Definition  MA 


There  are  several  things  to  notice  here.  (1)  Each  equals  sign  is  an  equality  of 
scalars  (numbers).  (2)  The  two  ends  of  the  equation,  being  true  for  any  i and  j , 
allow  us  to  conclude  the  equality  of  the  matrices  by  Definition  ME.  (3)  There  are 
several  plus  signs,  and  several  instances  of  juxtaposition.  Identify  each  one,  and  state 
exactly  what  operation  is  being  represented  by  each.  ■ 


For  now,  note  the  similarities  between  Theorem  VSPM  about  matrices  and 
Theorem  VSPCV  about  vectors. 

The  zero  matrix  described  in  this  theorem,  01  is  what  you  would  expect  — a 
matrix  full  of  zeros. 

Definition  ZM  Zero  Matrix 

The  m x n zero  matrix  is  written  as  O = Omxn  and  defined  by  [0}l:l  = 0,  for  all 
1 < i < m,  1 < j < n.  □ 
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Subsection  TSM 

Transposes  and  Symmetric  Matrices 

We  describe  one  more  common  operation  we  can  perform  on  matrices.  Informally, 
to  transpose  a matrix  is  to  build  a new  matrix  by  swapping  its  rows  and  columns. 

Definition  TM  Transpose  of  a Matrix 

Given  an  to  x n matrix  A,  its  transpose  is  the  n x to  matrix  A 4 given  by 
[A1 } i;j  = [A]^  , 1 < i < n,  1 < j < to. 

□ 


Example  TM  Transpose  of  a 3 x 4 matrix 
Suppose 


D = 


- 3 
-1 
0 


7 

4 

3 


2 

2 

-2 


-3" 

8 

5 


We  could  formulate  the  transpose,  entry-by-entry,  using  the  definition.  But  it  is 
easier  to  just  systematically  rewrite  rows  as  columns  (or  vice-versa).  The  form  of 
the  definition  given  will  be  more  useful  in  proofs.  So  we  have 


D*  = 


' 3 
7 
2 

-3 


-1 

4 

2 

8 


0 ' 

3 

-2 

5 


A 


It  will  sometimes  happen  that  a matrix  is  equal  to  its  transpose.  In  this  case,  we 
will  call  a matrix  symmetric.  These  matrices  occur  naturally  in  certain  situations, 
and  also  have  some  nice  properties,  so  it  is  worth  stating  the  definition  carefully. 
Informally  a matrix  is  symmetric  if  we  can  “flip”  it  about  the  main  diagonal  (upper- 
left  corner,  running  down  to  the  lower-right  corner)  and  have  it  look  unchanged. 


Definition  SYM  Symmetric  Matrix 


The  matrix  A is 

symmetric  if 

A = 

A4. 

Example  SYM 

A symmetric 

5 x 

5 matrix 

The  matrix 

' 2 

3 

—9 

5 

7 ' 

3 

1 

6 

-2 

-3 

E = 

-9 

6 

0 

-1 

9 

5 

-2 

-1 

4 

-8 

7 

-3 

9 

-8 

-3 

is  symmetric. 


□ 


A 


You  might  have  noticed  that  Definition  SYM  did  not  specify  the  size  of  the  matrix 
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A,  as  has  been  our  custom.  That  is  because  it  was  not  necessary.  An  alternative 
would  have  been  to  state  the  definition  just  for  square  matrices,  but  this  is  the 
substance  of  the  next  proof. 

Before  reading  the  next  proof,  we  want  to  offer  you  some  advice  about  how  to 
become  more  proficient  at  constructing  proofs.  Perhaps  you  can  apply  this  advice  to 
the  next  theorem.  Have  a peek  at  Proof  Technique  P now. 

Theorem  SMS  Symmetric  Matrices  are  Square 
Suppose  that  A is  a symmetric  matrix.  Then  A is  square. 

Proof.  We  start  by  specifying  A’s  size,  without  assuming  it  is  square,  since  we  are 
trying  to  prove  that,  so  we  cannot  also  assume  it.  Suppose  A is  an  m x n matrix. 
Because  A is  symmetric,  we  know  by  Definition  SYM  that  A = At.  So,  in  particular, 
Definition  ME  requires  that  A and  A4  must  have  the  same  size.  The  size  of  A4  is 
n x m.  Because  A has  m rows  and  A4  has  n rows,  we  conclude  that  m = n,  and 
hence  A must  be  square  by  Definition  SQM.  ■ 


We  finish  this  section  with  three  easy  theorems,  but  they  illustrate  the  interplay 
of  our  three  new  operations,  our  new  notation,  and  the  techniques  used  to  prove 
matrix  equalities. 

Theorem  TMA  Transpose  and  Matrix  Addition 

Suppose  that  A and  B are  m x n matrices.  Then  (A  + B)4  = A4  + B4. 


Proof.  The  statement  to  be  proved  is  an  equality  of  matrices,  so  we  work  entry-by- 
entry and  use  Definition  ME.  Think  carefully  about  the  objects  involved  here,  and 
the  many  uses  of  the  plus  sign.  Realize  too  that  while  A and  B are  m x n matrices, 
the  conclusion  is  a statement  about  the  equality  of  two  n x m matrices.  So  we  begin 
with:  for  1 < i < n,  1 < j < m, 


[(A  + B)4]..  = [A  + JB].i 

= [A]ji  + [B]^ 

= [A%  + [Bt], 

= [At  + B% 


Definition  TM 
Definition  MA 
Definition  TM 
Definition  MA 


Since  the  matrices  (A  + B)4  and  A4  + B4  agree  at  each  entry,  Definition  ME  tells 
us  the  two  matrices  are  equal.  I 


Theorem  TMSM  Transpose  and  Matrix  Scalar  Multiplication 
Suppose  that  a € C and  A is  an  m x n matrix.  Then  (ctA)4  = aAt . 

Proof.  The  statement  to  be  proved  is  an  equality  of  matrices,  so  we  work  entry-by- 
entry and  use  Definition  ME.  Notice  that  the  desired  equality  is  of  n x m matrices,  and 
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think  carefully  about  the  objects  involved  here,  plus  the  many  uses  of  juxtaposition. 
For  1 < i < m,  1 < j < n, 


[(aA)l]ji  = [ aA\ij 

Definition  TM 

II 

Q 

Ab 

Definition  MSM 

ii 

s 

'•o. 

Definition  TM 

II 

IT' 

5-0 . 

Definition  MSM 

Since  the  matrices  (a  A)1  and  a A*  agree  at  each  entry,  Definition  ME  tells  us  the 
two  matrices  are  equal.  ■ 

Theorem  TT  Transpose  of  a Transpose 
Suppose  that  A is  an  m x n matrix.  Then  (A4)  = A. 

Proof.  We  again  want  to  prove  an  equality  of  matrices,  so  we  work  entry-by-entry 
and  use  Definition  ME.  For  1 < * < m,  1 < j < n, 

Definition  TM 
Definition  TM 

Since  the  matrices  (A4)*  and  A agree  at  each  entry,  Definition  ME  tells  us  the 
two  matrices  are  equal.  ■ 


J in  J 

= 


Subsection  MCC 

Matrices  and  Complex  Conjugation 


As  we  did  with  vectors  (Definition  CCCV),  we  can  define  what  it  means  to  take  the 
conjugate  of  a matrix. 

Definition  CCM  Complex  Conjugate  of  a Matrix 

Suppose  A is  an  m x n matrix.  Then  the  conjugate  of  A,  written  A is  an  m x n 
matrix  defined  by 

P]«  = PC 

□ 


Example  CCM  Complex  conjugate  of  a matrix 
If 


then 


A—  2 * ^ 

-3  + 6*  2-3  * 


5 


+ 4 i 

0 


. _ 2 + i 3 5 — 4* 

-3-6  i 2 + 3 i 0 
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The  interplay  between  the  conjugate  of  a 
matrices  is  what  you  might  expect. 

matrix  and  the  two  operations  on 

Theorem  CRMA  Conjugation  Respects  Matrix  Addition 
Suppose  that  A and  B are  m x n matrices.  Then  A + B = A + B. 

Proof.  For  1 < i < m,  1 < j < n, 

IA  + B\„  = IA  + B]lj 

Definition  CCM 

= lA\ij  + \B\ij 

Definition  MA 

= [A\ij  + \B]ij 

Theorem  CCRA 

= p]„  + p]« 

Definition  CCM 

= P + B]„ 

Definition  MA 

Since  the  matrices  A + B and  A + B are  equal  in  each  entry,  Definition  ME  says 
that  A + B = A + B.  ■ 

Theorem  CRMSM  Conjugation  Respects  Matrix  Scalar  Multiplication 
Suppose  that  a £ C and  A is  an  m x n matrix.  Then  aA  = cL4. 

Proof.  For  1 < i < m,  1 < j < n, 
[aA\x.  = [aAjjj 

Definition  CCM 

= a iA]ij 

Definition  MSM 

II 

Theorem  CCRM 

= a^\ij 

Definition  CCM 

= 

Definition  MSM 

Since  the  matrices  aA  and  aA  are  equal  in  each  entry,  Definition  ME  says  that 
aA  = qL4.  ■ 


Theorem  CCM  Conjugate  of  the  Conjugate  of  a Matrix 
Suppose  that  A is  an  m x n matrix.  Then  (A)  = A. 


Proof.  For  1 < i < m,  1 < j < n, 


G*)J 

IJ  LJ 

Definition  CCM 

= [A]tf 

Definition  CCM 

= 

Theorem  CCT 
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Since  the  matrices  (A)  and  A are  equal  in  each  entry,  Definition  ME  says  that 

P)=A.  ■ 

Finally,  we  will  need  the  following  result  about  matrix  conjugation  and  transposes 
later. 

Theorem  MCT  Matrix  Conjugation  and  Transposes 
Suppose  that  A is  an  m x n matrix.  Then  (A4)  = (A)  . 

Proof.  For  1 < i < m,  1 < j < n, 


(A*) 

II 

<r-t- 

'o. 

Definition  CCM 

= iA]ij 

Definition  TM 

= PL, 

Definition  CCM 

= [P)1.. 

Definition  TM 

J ji 


Since  the  matrices  (A*)  and  (A)f  are  equal  in  each  entry,  Definition  ME  says 
that  (A*)  = (A)*.  ■ 

Subsection  AM 
Adjoint  of  a Matrix 

The  combination  of  transposing  and  conjugating  a matrix  will  be  important  in 
subsequent  sections,  such  as  Subsection  MINM.UM  and  Section  OD.  We  make  a 
key  definition  here  and  prove  some  basic  results  in  the  same  spirit  as  those  above. 

Definition  A Adjoint 

If  A is  a matrix,  then  its  adjoint  is  A*  = (A)  . □ 

You  will  see  the  adjoint  written  elsewhere  variously  as  AH , A*  or  AC  Notice  that 
Theorem  MCT  says  it  does  not  really  matter  if  we  conjugate  and  then  transpose,  or 
transpose  and  then  conjugate. 

Theorem  AMA  Adjoint  and  Matrix  Addition 

Suppose  A and  B are  matrices  of  the  same  size.  Then  (A  + B)  = A*  + B* . 

Proof. 


(. A + B )*  = (. A + B)* 

= (A+By 
= (A)t  + (B)t 


Definition  A 
Theorem  CRMA 
Theorem  TMA 
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= A*  + B* 

Definition  A 

Theorem  AMSM  Adjoint  and  Matrix  Scalar  Multiplication 
Suppose  a £ C is  a scalar  and  A is  a matrix.  Then  (a  A)  = aA* . 

Proof. 

hT 

IS 

ii 

* 

Definition  A 

= 

Theorem  CRMSM 

= a(A)‘ 

Theorem  TMSM 

= aA* 

Definition  A 

Theorem  AA  Adjoint  of  an  Adjoint 
Suppose  that  A is  a matrix.  Then  (A*)*  = A. 

Proof. 

(A*r  = ((a*)) 

Definition  A 

= ({A*)*) 

Theorem  MCT 

- ((<a>‘)') 

Definition  A 

= 0) 

Theorem  TT 

= A 

Theorem  CCM 

Take  note  of  how  the  theorems  in  this  section,  while  simple,  build  on  earlier 
theorems  and  definitions  and  never  descend  to  the  level  of  entry-by-entry  proofs 
based  on  Definition  ME.  In  other  words,  the  equal  signs  that  appear  in  the  previous 
proofs  are  equalities  of  matrices,  not  scalars  (which  is  the  opposite  of  a proof  like 
that  of  Theorem  TMA). 

Reading  Questions 


1.  Perform  the  following  matrix  computation. 


'2  -2 

8 1 

'2 

7 1 

2 

4 5 
7 -3 

-1  3 
0 2 

+ (-2) 

3 

1 

-1  0 
7 3 

5 

3 

(6) 
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2.  Theorem  VSPM  reminds  you  of  what  previous  theorem?  How  strong  is  the  similarity? 

3.  Compute  the  transpose  of  the  matrix  below. 


6 8 4 

-2  1 0 
9-5  6 


Exercises 

C10f  Let  A = 


4 -3 
3 0 


B = 


3 2 

-2  -6 


1 


and  C = 


2 

4 

-2 


. Let  a = 4 and 


P = 1/2.  Perform  the  following  calculations:  (1)  A + B,  (2)  A + C,  (3)  B 4 + C,  (4)  A + B4, 
(5)  pC,  (6)  4 A - 3 B,  (7)  At  + aC,  (8)  A + B - C4,  (9)  4A  + 2B  - 5C4. 

ClE  Solve  the  given  vector  equation  for  x,  or  explain  why  no  solution  exists: 


2 3 
4 2 


-3 


-1 

0 


1 0 
5 -2 


012^  Solve  the  given  matrix  equation  for  a.,  or  explain  why  no  solution  exists: 

+ 


3 4 

1 -1 


3 -6 

1 1 


12  6 
4 -2 


013^  Solve  the  given  matrix  equation  for  a,  or  explain  why  no  solution  exists: 


'3  1' 

4 l' 

'2  1 ' 

2 0 

— 

3 2 

= 

1 -2 

1 4 

0 1 

2 6 

014^  Find  a and  p that  solve  the  following  equation: 


'1  2 

2 1 

'-1  4 

a 

4 1 

+ 13 

3 1 

— 

6 1 

In  Chapter  V we  defined  the  operations  of  vector  addition  and  vector  scalar  multiplication 
in  Definition  CVA  and  Definition  CVSM.  These  two  operations  formed  the  underpinnings 
of  the  remainder  of  the  chapter.  We  have  now  defined  similar  operations  for  matrices  in 
Definition  MA  and  Definition  MSM.  You  will  have  noticed  the  resulting  similarities  between 
Theorem  VSPCV  and  Theorem  VSPM. 

In  Exercises  M20-M25,  you  will  be  asked  to  extend  these  similarities  to  other  fundamental 
definitions  and  concepts  we  first  saw  in  Chapter  V.  This  sequence  of  problems  was  suggested 
by  Martin  Jackson. 

M20  Suppose  S = {B\,  B2,  B3,  . . . , Bp}  is  a set  of  matrices  from  Mmn.  Formulate 
appropriate  definitions  for  the  following  terms  and  give  an  example  of  the  use  of  each. 

1.  A linear  combination  of  elements  of  S. 

2.  A relation  of  linear  dependence  on  S,  both  trivial  and  nontrivial. 
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3.  S'  is  a linearly  independent  set. 

4.  (S). 

M2F  Show  that  the  set  S is  linearly  independent  in  Afe- 


S = 


1 0 
0 0 


0 1 
0 0 


0 0 
1 0 


0 0 
0 1 


M22^  Determine  if  the  set  S below  is  linearly  independent  in  M23. 


2 

3 4 ' 

4 -2  2 

— 1 -2  —2 

-11  o' 

-1  2 -2 

1 

3 -2 

5 

0 -1  1 

5 

2 2 2 

J 

-1  0 -2 

5 

0 -1  -2 

M23^  Determine  if  the  matrix  A is  in  the  span  of  S.  In  other  words,  is  A £ (S)?  If  so 
write  A as  a linear  combination  of  the  elements  of  S. 


-2  2 

-1  1 


-1 

2 


-2  -2 

2 2 


-11  o' 

-1  2 —2 

-1  0 -2 

0 -1  -2 

M24f  Suppose  Y is  the  set  of  all  3 x 3 symmetric  matrices  (Definition  SYM).  Find  a 
set  T so  that  T is  linearly  independent  and  (T)  = Y. 

M25  Define  a subset  of  M33  by 

U33  = | A € M33 1 [A\l:j  = 0 whenever  i > j j 

Find  a set  R so  that  R is  linearly  independent  and  (R)  = U33. 


TIS^  Prove  Property  CM  of  Theorem  VSPM.  Write  your  proof  in  the  style  of  the  proof 
of  Property  DSAM  given  in  this  section. 

T14  Prove  Property  AAM  of  Theorem  VSPM.  Write  your  proof  in  the  style  of  the  proof 
of  Property  DSAM  given  in  this  section. 

T17  Prove  Property  SMAM  of  Theorem  VSPM.  Write  your  proof  in  the  style  of  the 
proof  of  Property  DSAM  given  in  this  section. 

T18  Prove  Property  DMAM  of  Theorem  VSPM.  Write  your  proof  in  the  style  of  the 
proof  of  Property  DSAM  given  in  this  section. 


A matrix  A is  skew-symmetric  if  A*  = —A  Exercises  T30-T37  employ  this  definition. 
T30  Prove  that  a skew-symmetric  matrix  is  square.  (Hint:  study  the  proof  of  Theorem 
SMS.) 

T31  Prove  that  a skew-symmetric  matrix  must  have  zeros  for  its  diagonal  elements. 
In  other  words,  if  A is  skew-symmetric  of  size  n,  then  [A]i;  = 0 for  1 < i < n.  (Hint: 
carefully  construct  an  example  of  a 3 x 3 skew-symmetric  matrix  before  attempting  a 
proof.) 

T32  Prove  that  a matrix  A is  both  skew-symmetric  and  symmetric  if  and  only  if  A is 
the  zero  matrix.  (Hint:  one  half  of  this  proof  is  very  easy,  the  other  half  takes  a little 
more  work.) 
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T33  Suppose  A and  B are  both  skew-symmetric  matrices  of  the  same  size  and  a,  (3  G C. 
Prove  that  aA  + /3B  is  a skew-symmetric  matrix. 

T34  Suppose  A is  a square  matrix.  Prove  that  A + A*  is  a symmetric  matrix. 

T35  Suppose  A is  a square  matrix.  Prove  that  A — A*  is  a skew-symmetric  matrix. 

T36  Suppose  A is  a square  matrix.  Prove  that  there  is  a symmetric  matrix  B and  a 

skew-symmetric  matrix  C such  that  A = B + C . In  other  words,  any  square  matrix  can 
be  decomposed  into  a symmetric  matrix  and  a skew-symmetric  matrix  (Proof  Technique 
DC).  (Hint:  consider  building  a proof  on  Exercise  MO.T34  and  Exercise  MO.T35.) 
T37  Prove  that  the  decomposition  in  Exercise  MO.T36  is  unique  (see  Proof  Technique 
U).  (Hint:  a proof  can  turn  on  Exercise  MO.T31.) 


Section  MM 
Matrix  Multiplication 


We  know  how  to  add  vectors  and  how  to  multiply  them  by  scalars.  Together,  these 
operations  give  us  the  possibility  of  making  linear  combinations.  Similarly,  we  know 
how  to  add  matrices  and  how  to  multiply  matrices  by  scalars.  In  this  section  we  mix 
all  these  ideas  together  and  produce  an  operation  known  as  “matrix  multiplication.” 
This  will  lead  to  some  results  that  are  both  surprising  and  central.  We  begin  with  a 
definition  of  how  to  multiply  a vector  by  a matrix. 


Subsection  MVP 
Matrix- Vector  Product 


We  have  repeatedly  seen  the  importance  of  forming  linear  combinations  of  the 
columns  of  a matrix.  As  one  example  of  this,  the  oft-used  Theorem  SLSLC,  said 
that  every  solution  to  a system  of  linear  equations  gives  rise  to  a linear  combination 
of  the  column  vectors  of  the  coefficient  matrix  that  equals  the  vector  of  constants. 
This  theorem,  and  others,  motivate  the  following  central  definition. 


Definition  MVP  Matrix- Vector  Product 

Suppose  A is  an  to  x n matrix  with  columns  Ai,  A2,  A3,  . . . , An  and  u is  a vector 
of  size  n.  Then  the  matrix-vector  product  of  A with  u is  the  linear  combination 


An  = [u]1  Ai  + [u]2  A2  + [u]3  A3  H b [u]n  An 


□ 


So,  the  matrix-vector  product  is  yet  another  version  of  “multiplication,”  at  least 
in  the  sense  that  we  have  yet  again  overloaded  juxtaposition  of  two  symbols  as  our 
notation.  Remember  your  objects,  an  to  x n matrix  times  a vector  of  size  n will 
create  a vector  of  size  to.  So  if  A is  rectangular,  then  the  size  of  the  vector  changes. 
With  all  the  linear  combinations  we  have  performed  so  far,  this  computation  should 
now  seem  second  nature. 


Example  MTV  A matrix  times  a vector 
Consider 


A = 


- 1 4 

-3  2 
1 6 


2 3 

0 1 
-3  -1 


4 ' 
-2 

5 


Then 


' 2 ' 
1 

-2 

3 

-1 


Au  = 2 

' 1 ' 

-3 

+ 1 

A- 

2 

+ (-2) 

' 2 ‘ 
0 

+ 3 

' 3 ‘ 
1 

+ (-l) 

' 4 ' 
-2 



T 

1 

. 1 . 

6 

-3 

-1 

5 

6 

175 
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A 

We  can  now  represent  systems  of  linear  equations  compactly  with  a matrix- vector 
product  (Definition  MVP)  and  column  vector  equality  (Definition  CVE).  This  hnally 
yields  a very  popular  alternative  to  our  unconventional  AS(A,  b)  notation. 

Theorem  SLEMM  Systems  of  Linear  Equations  as  Matrix  Multiplication 
The  set  of  solutions  to  the  linear  system  CS{A1  b)  equals  the  set  of  solutions  for  x 
in  the  vector  equation  Ax  = b. 

Proof.  This  theorem  says  that  two  sets  (of  solutions)  are  equal.  So  we  need  to  show 
that  one  set  of  solutions  is  a subset  of  the  other,  and  vice  versa  (Definition  SE).  Let 
Ai,  A2,  A3,  . . . , An  be  the  columns  of  A.  Both  of  these  set  inclusions  then  follow 
from  the  following  chain  of  equivalences  (Proof  Technique  E) , 

x is  a solution  to  CS(A,  b) 

<*=>  [x]x  Ai  + [x]2  A2  + [x]3  A3  H + [x]nA„  = b Theorem  SLSLC 

<*=>■  x is  a solution  to  Ax  = b Definition  MVP 

Example  MNSLE  Matrix  notation  for  systems  of  linear  equations 
Consider  the  system  of  linear  equations  from  Example  NSLE. 

2x\  + 4x2  — 3x3  + 5x4  + £5  = 9 
3xi  + X2  + X4  — 3x5  = 0 
— 2xi  + 7x2  — 5x3  + 2x4  + 2x5  = —3 
has  coefficient  matrix  and  vector  of  constants 


- 2 

4 

—3 

5 

1 ' 

' 9 ' 

A = 

3 

1 

0 

1 

-3 

b = 

0 

—2 

7 

-5 

2 

2 

-3 

and  so  will  be  described  compactly  by  the  vector  equation  Ax  = b.  A 

The  matrix-vector  product  is  a very  natural  computation.  We  have  motivated  it 
by  its  connections  with  systems  of  equations,  but  here  is  another  example. 

Example  MBC  Money’s  best  cities 

Every  year  Money  magazine  selects  several  cities  in  the  United  States  as  the  “best” 
cities  to  live  in,  based  on  a wide  array  of  statistics  about  each  city.  This  is  an  example 
of  how  the  editors  of  Money  might  arrive  at  a single  number  that  consolidates  the 
statistics  about  a city.  We  will  analyze  Los  Angeles,  Chicago  and  New  York  City, 
based  on  four  criteria:  average  high  temperature  in  July  (Farenheit),  number  of 
colleges  and  universities  in  a 30-mile  radius,  number  of  toxic  waste  sites  in  the 
Superfund  environmental  clean-up  program  and  a personal  crime  index  based  on  FBI 
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statistics  (average  = 100,  smaller  is  safer).  It  should  be  apparent  how  to  generalize 
the  example  to  a greater  number  of  cities  and  a greater  number  of  statistics. 

We  begin  by  building  a table  of  statistics.  The  rows  will  be  labeled  with  the 
cities,  and  the  columns  with  statistical  categories.  These  values  are  from  Money's 
website  in  early  2005. 


City 

Temp 

Colleges 

Superfund 

Crime 

Los  Angeles 

77 

28 

93 

254 

Chicago 

84 

38 

85 

363 

New  York 

84 

99 

1 

193 

Conceivably  these  data  might  reside  in  a spreadsheet.  Now  we  must  combine 
the  statistics  for  each  city.  We  could  accomplish  this  by  weighting  each  category, 
scaling  the  values  and  summing  them.  The  sizes  of  the  weights  would  depend  upon 
the  numerical  size  of  each  statistic  generally,  but  more  importantly,  they  would 
reflect  the  editors  opinions  or  beliefs  about  which  statistics  were  most  important  to 
their  readers.  Is  the  crime  index  more  important  than  the  number  of  colleges  and 
universities?  Of  course,  there  is  no  right  answer  to  this  question. 

Suppose  the  editors  finally  decide  on  the  following  weights  to  employ:  temperature, 
0.23;  colleges,  0.46;  Superfund,  —0.05;  crime,  —0.20.  Notice  how  negative  weights 
are  used  for  undesirable  statistics.  Then,  for  example,  the  editors  would  compute  for 
Los  Angeles, 

(0.23)(77)  + (0.46)  (28)  + (-0.05)(93)  + (-0.20)(254)  = -24.86 


This  computation  might  remind  you  of  an  inner  product,  but  we  will  produce 
the  computations  for  all  of  the  cities  as  a matrix- vector  product.  Write  the  table  of 
raw  statistics  as  a matrix 


77 

28 

93 

254' 

T = 

84 

38 

85 

363 

84 

99 

1 

193 

and  the  weights  as  a vector 


w = 


‘ 0.23  ' 
0.46 
-0.05 


-0.20 


then  the  matrix-vector  product  (Definition  MVP)  yields 


Tw  = (0.23) 

77' 

84 

+ (0.46) 

■28' 

38 

+ (-0.05) 

-93- 

85 

+ (-0.20) 

'254' 

363 



-24.86' 

-40.05 

84 

99 

1 

193 

26.21 

This  vector  contains  a single  number  for  each  of  the  cities  being  studied,  so  the 
editors  would  rank  New  York  best  (26.21),  Los  Angeles  next  (—24.86),  and  Chicago 
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third  (—40.05).  Of  course,  the  mayor’s  offices  in  Chicago  and  Los  Angeles  are  free  to 
counter  with  a different  set  of  weights  that  cause  their  city  to  be  ranked  best.  These 
alternative  weights  would  be  chosen  to  play  to  each  cities’  strengths,  and  minimize 
their  problem  areas. 

If  a speadsheet  were  used  to  make  these  computations,  a row  of  weights  would 
be  entered  somewhere  near  the  table  of  data  and  the  formulas  in  the  spreadsheet 
would  effect  a matrix- vector  product.  This  example  is  meant  to  illustrate  how  “linear” 
computations  (addition,  multiplication)  can  be  organized  as  a matrix-vector  product. 

Another  example  would  be  the  matrix  of  numerical  scores  on  examinations  and 
exercises  for  students  in  a class.  The  rows  would  be  indexed  by  students  and  the 
columns  would  be  indexed  by  exams  and  assignments.  The  instructor  could  then 
assign  weights  to  the  different  exams  and  assignments,  and  via  a matrix-vector 
product,  compute  a single  score  for  each  student.  A 

Later  (much  later)  we  will  need  the  following  theorem,  which  is  really  a technical 
lemma  (see  Proof  Technique  LC).  Since  we  are  in  a position  to  prove  it  now,  we 
will.  But  you  can  safely  skip  it  for  the  moment,  if  you  promise  to  come  back  later  to 
study  the  proof  when  the  theorem  is  employed.  At  that  point  you  will  also  be  able 
to  understand  the  comments  in  the  paragraph  following  the  proof. 

Theorem  EMMVP  Equal  Matrices  and  Matrix- Vector  Products 

Suppose  that  A and  B are  m x n matrices  such  that  Ax  = Bx  for  every  x £ Cn. 

Then  A = B. 


Proof.  We  are  assuming  Ax  = Bx  for  all  x £ Cra,  so  we  can  employ  this  equality 
for  any  choice  of  the  vector  x.  However,  we  will  limit  our  use  of  this  equality  to  the 
standard  unit  vectors,  e . I < j < n (Definition  SUV).  For  all  1 < j < n,  1 < i < m, 


= 0 [A\%\  ■+  b 0 [A]ij_1  + 1 [A]^  + 0 [A]ij+1  H +0  [A]in 

= [eiJl  \A]il  + [e j ] 2 iA]i2  + [ej ] 3 lA]i3  b 1"  lej]n  iA] in 

= [A]i\  [ed]  1 + [A]i2  ] 2 + lA]i3  [ef]3  b b [A\in  [ef]n 

= lAelh 
= 

= [B\ii  [ej]i  + [B]i2  [e J ] 2 + [^3  [ei ] 3 b b [B\in  [ej\n 

= [eiJi  [B\n  + [ei ] 2 \B]i2  + [ej]3  [B]i3  -\ 1-  [ej]n  [B]in 

= 0 [B]xl  + ■ • • + 0 [B]i  -_1  + 1 [B\tj  + 0 [B]t  j+1  + ■ • • + 0 [B], 

= Wa 


Theorem  PCNA 
Property  CMCN 
Definition  SUV 
Definition  MVP 
Hypothesis 
Definition  MVP 
Property  CMCN 
Definition  SUV 
Theorem  PCNA 


So  by  Definition  ME  the  matrices  A and  B are  equal,  as  desired.  ■ 


You  might  notice  from  studying  the  proof  that  the  hypotheses  of  this  theorem 
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could  be  “weakened”  (i.e.  made  less  restrictive).  We  need  only  suppose  the  equality 
of  the  matrix-vector  products  for  just  the  standard  unit  vectors  (Definition  SUV)  or 
any  other  spanning  set  (Definition  SSVS)  of  Cra  (Exercise  LISS.T40).  However,  in 
practice,  when  we  apply  this  theorem  the  stronger  hypothesis  will  be  in  effect  so  this 
version  of  the  theorem  will  suffice  for  our  purposes.  (If  we  changed  the  statement  of 
the  theorem  to  have  the  less  restrictive  hypothesis,  then  we  would  call  the  theorem 
“stronger.”) 


Subsection  MM 
Matrix  Multiplication 


We  now  define  how  to  multiply  two  matrices  together.  Stop  for  a minute  and  think 
about  how  you  might  define  this  new  operation. 

Many  books  would  present  this  definition  much  earlier  in  the  course.  However, 
we  have  taken  great  care  to  delay  it  as  long  as  possible  and  to  present  as  many 
ideas  as  practical  based  mostly  on  the  notion  of  linear  combinations.  Towards  the 
conclusion  of  the  course,  or  when  you  perhaps  take  a second  course  in  linear  algebra, 
you  may  be  in  a position  to  appreciate  the  reasons  for  this.  For  now,  understand 
that  matrix  multiplication  is  a central  definition  and  perhaps  you  will  appreciate  its 
importance  more  by  having  saved  it  for  later. 


Definition  MM  Matrix  Multiplication 

Suppose  A is  an  m x n matrix  and  Bi,  B2,  B3,  . . . , Bp  are  the  columns  of  an  n x p 
matrix  B.  Then  the  matrix  product  of  A with  B is  the  mxp  matrix  where  column 
i is  the  matrix- vector  product  ABj.  Symbolically, 


AB  = A[B1|B2|B3|...|BP]  = [AB1|AB2|AB3|...|ABP], 


Example  PTM  Product  of  two  matrices 
Set 


A = 


1 2-1 
0 -4  1 

-5  12 


4 6' 

2 3 

-3  4 


B = 


' 1 
-1 
1 
6 
1 


6 

4 

1 

4 

-2 


2 

3 

2 

-1 

3 


Then 


AB  = 


' 1 ' 

' 6 ' 

' 2 ' 

T 

-1 

4 

3 

2 

' 28 

17 

20  10- 

A 

1 

A 

1 

A 

2 

A 

3 

= 

20 

-13 

-3  -1 

6 

4 

-1 

2 

-18 

-44 

12  -3 

1 

-2 

3 

0 

□ 


A 
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Is  this  the  definition  of  matrix  multiplication  you  expected?  Perhaps  our  previous 
operations  for  matrices  caused  you  to  think  that  we  might  multiply  two  matrices  of 
the  same  size,  entry -by- entry ? Notice  that  our  current  definition  uses  matrices  of 
different  sizes  (though  the  number  of  columns  in  the  first  must  equal  the  number 
of  rows  in  the  second),  and  the  result  is  of  a third  size.  Notice  too  in  the  previous 
example  that  we  cannot  even  consider  the  product  BA,  since  the  sizes  of  the  two 
matrices  in  this  order  are  not  right. 

But  it  gets  weirder  than  that.  Many  of  your  old  ideas  about  “multiplication”  will 
not  apply  to  matrix  multiplication,  but  some  still  will.  So  make  no  assumptions,  and 
do  not  do  anything  until  you  have  a theorem  that  says  you  can.  Even  if  the  sizes  are 
right,  matrix  multiplication  is  not  commutative  — order  matters. 


Example  MMNC  Matrix  multiplication  is  not  commutative 
Set 


Then  we  have  two  square,  2x2  matrices,  so  Definition  MM  allows  us  to  multiply 
them  in  either  order.  We  find 


AB 


19  3 
6 2 


BA  = 


4 

4 


12 

17 


and  AB  ^ BA.  Not  even  close.  It  should  not  be  hard  for  you  to  construct  other 
pairs  of  matrices  that  do  not  commute  (try  a couple  of  3 x 3’s).  Can  you  find  a pair 
of  non-identical  matrices  that  do  commute?  A 


Subsection  MMEE 

Matrix  Multiplication,  Entry-by-Entry 

While  certain  “natural”  properties  of  multiplication  do  not  hold,  many  more  do.  In 
the  next  subsection,  we  will  state  and  prove  the  relevant  theorems.  But  first,  we 
need  a theorem  that  provides  an  alternate  means  of  multiplying  two  matrices.  In 
many  texts,  this  would  be  given  as  the  definition  of  matrix  multiplication.  We  prefer 
to  turn  it  around  and  have  the  following  formula  as  a consequence  of  our  definition. 
It  will  prove  useful  for  proofs  of  matrix  equality,  where  we  need  to  examine  products 
of  matrices,  entry-by-entry. 

Theorem  EMP  Entries  of  Matrix  Products 

Suppose  A is  an  m x n matrix  and  B is  an  n x p matrix.  Then  for  1 < i < m, 
1 < j < p,  the  individual  entries  of  AB  are  given  by 

[AB]ij  = [A]n  [B]^  + [A] -2  [B]2  - + [A]l3  [ B}3 ■ H + [A]in  [B]n- 

n 

= [B]kj 

k=l 
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Proof.  Let  the  vectors  Ai,  A2,  A3,  . . . , A„  denote  the  columns  of  A and  let  the 
vectors  Bi,  B2,  B3,  . . . , Bp  denote  the  columns  of  B.  Then  for  1 < i < m,  1 < j < p, 


[AB]tj  = [ABj]. 

= [[BjJj  Ai  + [Bj]2  A2  H 1-  [B_,]n  An] . 

= [[BJ]1A1]t+[[BJ]2A2]i  + ...+  [[BJ]nA„]i 
= [Bjh  [Ai]j  + [Bj]2  [A2] . -1  I-  [Bj]n  [An]j 
= lB)i j l-A-ln  + [B\2j  [A]i2  -j  b [B]nj  [A]in 

= [A\n  [B] ij  + [A]i2  [B]2j  -f  b [A]in  [B]nj 

n 

= \.Mik  [B\kj 

k=  1 


Definition  MM 
Definition  MVP 
Definition  CVA 
Definition  CVSM 
Definition  M 
Property  CMCN 


Example  PTMEE  Product  of  two  matrices,  entry-by-entry 

Consider  again  the  two  matrices  from  Example  PTM 

' 1 

6 

2 

r 

' 1 2-14  6' 

-1 

4 

3 

2 

A = 

0-41  23 

B = 

1 

1 

2 

3 

-5  1 2-3  4 

6 

4 

-1 

2 

1 

-2 

3 

0 

Then  suppose  we  just  wanted  the  entry  of  AB  in  the  second  row,  third  column: 

[A_B]23  = [A]21  [b]13  + [A]22  [B}23  + [A]23  [.b]33  + [A]24  [B]43  + [A]25  [b]53 
=(0)(2)  + (—4)  (3)  + (1)(2)  + (2)(— 1)  + (3)(3)  = -3 

Notice  how  there  are  5 terms  in  the  sum,  since  5 is  the  common  dimension  of  the 
two  matrices  (column  count  for  A , row  count  for  B).  In  the  conclusion  of  Theorem 
EMP,  it  would  be  the  index  k that  would  run  from  1 to  5 in  this  computation.  Here 
is  a bit  more  practice. 

The  entry  of  third  row,  first  column: 

\ab\  31  = [A]31  [B]1:l  + [A]32  [B]21  + [A]33  [B]31  + [A]34  [B\a1  + [A]35  [b]51 
=(-5)(l)  + (1)(-1)  + (2)(1)  + ( 3) (6)  + (4)(1)  = -18 

To  get  some  more  practice  on  your  own,  complete  the  computation  of  the  other 
10  entries  of  this  product.  Construct  some  other  pairs  of  matrices  (of  compatible 
sizes)  and  compute  their  product  two  ways.  First  use  Definition  MM.  Since  linear 
combinations  are  straightforward  for  you  now,  this  should  be  easy  to  do  and  to  do 
correctly.  Then  do  it  again,  using  Theorem  EMP.  Since  this  process  may  take  some 
practice,  use  your  first  computation  to  check  your  work.  A 

Theorem  EMP  is  the  way  many  people  compute  matrix  products  by  hand.  It 
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will  also  be  very  useful  for  the  theorems  we  are  going  to  prove  shortly.  However,  the 
definition  (Definition  MM)  is  frequently  the  most  useful  for  its  connections  with 
deeper  ideas  like  the  null  space  and  the  upcoming  column  space. 

Subsection  PMM 

Properties  of  Matrix  Multiplication 

In  this  subsection,  we  collect  properties  of  matrix  multiplication  and  its  interaction 
with  the  zero  matrix  (Definition  ZM),  the  identity  matrix  (Definition  IM),  matrix 
addition  (Definition  MA),  scalar  matrix  multiplication  (Definition  MSM),  the  inner 
product  (Definition  IP),  conjugation  (Theorem  MMCC),  and  the  transpose  (Defi- 
nition TM).  Whew!  Here  we  go.  These  are  great  proofs  to  practice  with,  so  try  to 
concoct  the  proofs  before  reading  them,  they  will  get  progressively  more  complicated 
as  we  go. 

Theorem  MMZM  Matrix  Multiplication  and  the  Zero  Matrix 
Suppose  A is  an  m x n matrix.  Then 

T AOnxp  — @mxp 

2-  Opxm-A  — Opxn 

Proof.  We  will  prove  (1)  and  leave  (2)  to  you.  Entry- by-entry,  for  1 < i < m, 

1 < 3 < V, 

n 

[AOnxp]..  = \.A\ik  [°nxp\kj  Theorem  EMP 

k= 1 
n 

= ^ [A\ik  0 Definition  ZM 

k= 1 
n 

k= 1 

= 0 Property  ZCN 

= [Om  xplij  Definition  ZM 

So  by  the  definition  of  matrix  equality  (Definition  ME),  the  matrices  AOnxp  and 
OmxP  are  equal.  ■ 

Theorem  MMIM  Matrix  Multiplication  and  Identity  Matrix 
Suppose  A is  an  m x n matrix.  Then 


1.  AIn  = A 

2.  ImA  = A 
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Proof.  Again,  we  will  prove  (1)  and  leave  (2)  to  you.  Entry-by-entry,  For  1 < i < m, 
1 <j<n, 


n 


E \A\ik  [In\kj 
k= 1 

Theorem  EMP 

= \A\ij  + E! 

k=  1 

Property  CACN 

= [^]«(i)+  E 

k=l,k^j 

Definition  IM 

= M«+  £ ° 

k=l,k^j 

= 

So  the  matrices  A and  AIn  are  equal,  entry-by-entry,  and  by  the  definition  of 
matrix  equality  (Definition  ME)  we  can  say  they  are  equal  matrices.  ■ 

It  is  this  theorem  that  gives  the  identity  matrix  its  name.  It  is  a matrix  that 
behaves  with  matrix  multiplication  like  the  scalar  1 does  with  scalar  multiplication. 
To  multiply  by  the  identity  matrix  is  to  have  no  effect  on  the  other  matrix. 

Theorem  MMDAA  Matrix  Multiplication  Distributes  Across  Addition 
Suppose  A is  an  m x n matrix  and  B and  C are  n x p matrices  and  D is  a p x s 
matrix.  Then 


1.  A{B  + C)  = AB  + AC 

2.  (B  + C)D  = BD  + CD 


Proof.  We  will  do  (1),  you  do  (2).  Entry- by-entry,  for 

1 < i < m,  1 < j < p, 

n 

[A{B  + C)}ij  = Y.Wik\-B  + C\kj 

k= 1 

Theorem  EMP 

n 

= ^2  [A]ik  ( iB\kj  + ic\  kj) 

k=l 

Definition  MA 

n 

= ^2  + [A\ik  [C\kj 

k=  1 

Property  DCN 

n n 

= E \-A)ik  [■ B)kj  + E 

Property  CACN 

k- 1 k- 1 


= [AB]ij  + [AC^j 


Theorem  EMP 
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= [AB  + AC\^  Definition  MA 

So  the  matrices  A(B  + C)  and  AB  + AC  are  equal,  entry-by-entry,  and  by  the 
definition  of  matrix  equality  (Definition  ME)  we  can  say  they  are  equal  matrices. ■ 

Theorem  MMSMM  Matrix  Multiplication  and  Scalar  Matrix  Multiplication 
Suppose  A is  an  m x n matrix  and  B is  an  n x p matrix.  Let  a be  a scalar.  Then 
a{AB)  = ( aA)B  = A{aB). 


Proof.  These  are  equalities  of  matrices.  We  will  do  the  first  one,  the  second  is  similar 
and  will  be  good  practice  for  you.  For  1 < i < m,  1 < j < p, 


[a{AB)\lj=a[AB]ij 

n 

= aJ2  Wik  [SW 

k= 1 
n 

= E a ^ik  Is  w 

k= 1 
n 

= E Mk-  ^ 

k= 1 


Definition  MSM 
Theorem  EMP 

Property  DCN 

Definition  MSM 


= [(aA)B]^  Theorem  EMP 

So  the  matrices  a(AB)  and  ( aA)B  are  equal,  entry-by-entry,  and  by  the  definition 
of  matrix  equality  (Definition  ME)  we  can  say  they  are  equal  matrices.  ■ 


Theorem  MMA  Matrix  Multiplication  is  Associative 

Suppose  A is  an  m x n matrix,  B is  an  n x p matrix  and  D is  a p x s matrix.  Then 
A(BD ) = ( AB)D . 


Proof.  A matrix  equality,  so  we  will  go  entry-by-entry,  no  surprise  there.  For  1 < 
* < m,  1 < j < s, 


n 

[A(BD)}t]=J2[A]zk[BD}kj 

k= 1 

n / p \ 

= EtAU  E^A 

k= i \e=i  J 

n p 

= EE^wbudW 

ic= i i~  i 


Theorem  EMP 


Theorem  EMP 


Property  DCN 
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We  can  switch  the  order  of  the  summation  since  these  are  finite  sums, 

p n 

= Mik  [Blk<  [D\ij  Property  CACN 

t=l  k= 1 

As  [D]^  does  not  depend  on  the  index  k , we  can  use  distributivity  to  move  it  outside 
of  the  inner  sum, 

p / n 

= £p]« 

1=1  \k= 1 

V 

= J2  \D\lj  iABlu  Theorem  EMP 

1=1 
P 

= \AB\u  [■ D\ij  Property  CMCN 

e= i 

= [{AB)D\i-  Theorem  EMP 

So  the  matrices  ( AB)D  and  A(BD)  are  equal,  entry-by-entry,  and  by  the  definition 
of  matrix  equality  (Definition  ME)  we  can  say  they  are  equal  matrices.  ■ 

Since  Theorem  MMA  says  matrix  multipication  is  associative,  it  means  we  do 
not  have  to  be  careful  about  the  order  in  which  we  perform  matrix  multiplication, 
nor  how  we  parenthesize  an  expression  with  just  several  matrices  multiplied  togther. 
So  this  is  where  we  draw  the  line  on  explaining  every  last  detail  in  a proof.  We  will 
frequently  add,  remove,  or  rearrange  parentheses  with  no  comment.  Indeed,  I only 
see  about  a dozen  places  where  Theorem  MMA  is  cited  in  a proof.  You  could  try  to 
count  how  many  times  we  avoid  making  a reference  to  this  theorem. 

The  statement  of  our  next  theorem  is  technically  inaccurate.  If  we  upgrade  the 
vectors  u,  v to  matrices  with  a single  column,  then  the  expression  u4v  is  a 1 x 1 
matrix,  though  we  will  treat  this  small  matrix  as  if  it  was  simply  the  scalar  quantity 
in  its  lone  entry.  When  we  apply  Theorem  MMIP  there  should  not  be  any  confusion. 
Notice  that  if  we  treat  a column  vector  as  a matrix  with  a single  column,  then  we 
can  also  construct  the  adjoint  of  a vector,  though  we  will  not  make  this  a common 
practice. 

Theorem  MMIP  Matrix  Multiplication  and  Inner  Products 
If  we  consider  the  vectors  u,  v £ Cm  as  m x 1 matrices  then 

(u,  v)  = u4v  = u*v 


Property  DCN 


Proof. 


u,  V 


= X>], 


k= 1 


Definition  IP 
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= H [“]«  Mfci 

k—1 

m 

= Mfci 

k= 1 


m 


= E[U1uIV1h 


Column  vectors  as  matrices 


Definition  CCM 


Definition  TM 
Theorem  EMP 


To  finish  we  just  blur  the  distinction  between  a 1 x 1 matrix  (u(v)  and  its  lone 
entry.  ■ 

Theorem  MMCC  Matrix  Multiplication  and  Complex  Conjugation 
Suppose  A is  an  m x n matrix  and  B is  an  n x p matrix.  Then  AB  = AB. 


Proof.  To  obtain  this  matrix  equality,  we  will  work  entry-by-entry.  For  1 < i < m, 
1 <j<P: 


ws  = m„ 

n 

= [A]ik  [ B]kj 
k= 1 
n 

k= 1 
n 

fc= i 


Definition  CCM 
Theorem  EMP 

Theorem  CCRA 

Theorem  CCRM 


n 

= [A]  ik  [R]  Definition  CCM 

k= 1 

= [A  R]  Theorem  EMP 

So  the  matrices  AR  and  A B are  equal,  entry-by-entry,  and  by  the  definition  of 
matrix  equality  (Definition  ME)  we  can  say  they  are  equal  matrices.  ■ 


Another  theorem  in  this  style,  and  it  is  a good  one.  If  you  have  been  practicing 
with  the  previous  proofs  you  should  be  able  to  do  this  one  yourself. 

Theorem  MMT  Matrix  Multiplication  and  Transposes 

Suppose  A is  an  m x n matrix  and  B is  an  n x p matrix.  Then  ( AB )*  = BtAt. 
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Proof.  This  theorem  may  be  surprising  but  if  we  check  the  sizes  of  the  matrices 
involved,  then  maybe  it  will  not  seem  so  far-fetched.  First,  AB  has  size  m x p,  so 
its  transpose  has  size  p x m.  The  product  of  Bl  with  A*  is  a p x n matrix  times  an 
n x to  matrix,  also  resulting  in  a p x to  matrix.  So  at  least  our  objects  are  compatible 
for  equality  (and  would  not  be,  in  general,  if  we  did  not  reverse  the  order  of  the 
matrix  multiplication) . 

Here  we  go  again,  entry-by-entry.  For  1 < i < to,  1 < j < p, 

[(AB)*]  = [AB]^  Definition  TM 

n 

= E Mi*  iB]kj  Theorem  EMP 

k= 1 
n 

= E Mi*  Property  CMCN 

k= 1 


fc=l 


= [*^]„ 


Definition  TM 
Theorem  EMP 


So  the  matrices  (AB)*  and  BtAt  are  equal,  entry-by-entry,  and  by  the  definition 
of  matrix  equality  (Definition  ME)  we  can  say  they  are  equal  matrices.  ■ 


This  theorem  seems  odd  at  first  glance,  since  we  have  to  switch  the  order  of 
A and  B.  But  if  we  simply  consider  the  sizes  of  the  matrices  involved,  we  can  see 
that  the  switch  is  necessary  for  this  reason  alone.  That  the  individual  entries  of  the 
products  then  come  along  to  be  equal  is  a bonus. 

As  the  adjoint  of  a matrix  is  a composition  of  a conjugate  and  a transpose,  its 
interaction  with  matrix  multiplication  is  similar  to  that  of  a transpose.  Here  is  the 
last  of  our  long  list  of  basic  properties  of  matrix  multiplication. 

Theorem  MMAD  Matrix  Multiplication  and  Adjoints 

Suppose  A is  an  m x n matrix  and  B is  an  n x p matrix.  Then  (AB)*  = B*  A* . 


Proof. 


(AB)*  = (. AB )* 

= (abY 

= (®)W 

= B*  A* 


Definition  A 
Theorem  MMCC 

Theorem  MMT 
Definition  A 


Notice  how  none  of  these  proofs  above  relied  on  writing  out  huge  general  matrices 
with  lots  of  ellipses  (“...”)  and  trying  to  formulate  the  equalities  a whole  matrix  at 
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a time.  This  messy  business  is  a “proof  technique”  to  be  avoided  at  all  costs.  Notice 
too  how  the  proof  of  Theorem  MMAD  does  not  use  an  entry-by-entry  approach, 
but  simply  builds  on  previous  results  about  matrix  multiplication’s  interaction  with 
conjugation  and  transposes. 

These  theorems,  along  with  Theorem  VSPM  and  the  other  results  in  Section 
MO,  give  you  the  “rules”  for  how  matrices  interact  with  the  various  operations 
we  have  defined  on  matrices  (addition,  scalar  multiplication,  matrix  multiplication, 
conjugation,  transposes  and  adjoints).  Use  them  and  use  them  often.  But  do  not  try 
to  do  anything  with  a matrix  that  you  do  not  have  a rule  for.  Together,  we  would 
informally  call  all  these  operations,  and  the  attendant  theorems,  “the  algebra  of 
matrices.”  Notice,  too,  that  every  column  vector  is  just  a n x 1 matrix,  so  these 
theorems  apply  to  column  vectors  also.  Finally,  these  results,  taken  as  a whole,  may 
make  us  feel  that  the  definition  of  matrix  multiplication  is  not  so  unnatural. 

Subsection  HM 
Hermitian  Matrices 

The  adjoint  of  a matrix  has  a basic  property  when  employed  in  a matrix-vector 
product  as  part  of  an  inner  product.  At  this  point,  you  could  even  use  the  following 
result  as  a motivation  for  the  definition  of  an  adjoint. 

Theorem  AIP  Adjoint  and  Inner  Product 

Suppose  that  A is  an  m x n matrix  and  x £ Cn,  y £ Cm.  Then  (Ax,  y)  = (x,  A*y). 
Proof. 


(Ax,  y)  = (Ax)4  y 

Theorem  MMIP 

= (Ax)‘y 

Theorem  MMCC 

= x‘  Ay 

Theorem  MMT 

= x*  (A*y) 

Definition  A 

= (x,  A*y) 

Theorem  MMIP 

Sometimes  a matrix  is  equal  to  its  adjoint  (Definition  A),  and  these  matrices 
have  interesting  properties.  One  of  the  most  common  situations  where  this  occurs 
is  when  a matrix  has  only  real  number  entries.  Then  we  are  simply  talking  about 
symmetric  matrices  (Definition  SYM),  so  you  can  view  this  as  a generalization  of  a 
symmetric  matrix. 

Definition  HM  Hermitian  Matrix 

The  square  matrix  A is  Hermitian  (or  self-adjoint)  if  A = A*.  □ 
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Again,  the  set  of  real  matrices  that  are  Hermitian  is  exactly  the  set  of  symmetric 
matrices.  In  Section  PEE  we  will  uncover  some  amazing  properties  of  Hermitian 
matrices,  so  when  you  get  there,  run  back  here  to  remind  yourself  of  this  definition. 
Further  properties  will  also  appear  in  Section  OD.  Right  now  we  prove  a fundamental 
result  about  Hermitian  matrices,  matrix  vector  products  and  inner  products.  As  a 
characterization,  this  could  be  employed  as  a definition  of  a Hermitian  matrix  and 
some  authors  take  this  approach. 

Theorem  HMIP  Hermitian  Matrices  and  Inner  Products 

Suppose  that  A is  a square  matrix  of  size  n.  Then  A is  Hermitian  if  and  only  if 
(Ax,  y)  = (x,  Ay)  for  all  x,  y £ Cn. 

Proof.  (=>)  This  is  the  “easy  half”  of  the  proof,  and  makes  the  rationale  for  a 
definition  of  Hermitian  matrices  most  obvious.  Assume  A is  Hermitian, 

(Ax,  y)  = (x,  A*y)  Theorem  AIP 

= (x,  Ay)  Definition  HM 


(<*=)  This  “half”  will  take  a bit  more  work.  Assume  that  (Ax,  y)  = (x,  Ay)  for 
all  x,  yt  Cn.  We  show  that  A = A*  by  establishing  that  Ax  = A*x  for  all  x,  so  we 
can  then  apply  Theorem  EMMVP.  With  only  this  much  motivation,  consider  the 
inner  product  for  any  x £ Cn. 


(Ax  — A*x,  Ax  — A*x)  = (Ax  — A*x,  Ax)  — (Ax  — A*x,  A*x) 
= (Ax  — A*x,  Ax)  — (A  (Ax  — A*x) , x) 
= (Ax  — A*x,  Ax)  — (Ax  — A*x,  Ax) 

= 0 


Theorem  IPVA 
Theorem  AIP 
Hypothesis 
Property  AICN 


Because  this  first  inner  product  equals  zero,  and  has  the  same  vector  in  each 
argument  (Ax  — A*x),  Theorem  PIP  gives  the  conclusion  that  Ax  — A*x  = 0.  With 
Ax  = A*x  for  all  x £ C",  Theorem  EMMVP  says  A = A*,  which  is  the  defining 
property  of  a Hermitian  matrix  (Definition  HM).  ■ 


So,  informally,  Hermitian  matrices  are  those  that  can  be  tossed  around  from  one 
side  of  an  inner  product  to  the  other  with  reckless  abandon.  We  will  see  later  what 
this  buys  us. 


Reading  Questions 


1.  Form  the  matrix  vector  product  of 


'2 

3 

-1 

o' 

‘ 2 ‘ 

1 

— 2 

7 

3 

with 

1 

01  ° d 

1 

1 

5 

3 

2 
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2.  Multiply  together  the  two  matrices  below  (in  the  order  given). 


2 3-10 

1-273 
15  3 2 


' 2 

-3 

0 

3 


6 ' 

-4 

2 

-1 


3.  Rewrite  the  system  of  linear  equations  below  as  a vector  equality  and  using  a matrix- 
vector  product.  (This  question  does  not  ask  for  a solution  to  the  system.  But  it  does 
ask  you  to  express  the  system  of  equations  in  a new  form  using  tools  from  this  section.) 

2a;i  + 3*2  — *3  = 0 
*1  + 2*2  +*3=3 
*i  + 3*2  + 3*3  = 7 


Exercises 

C20'  Compute  the  product  of  the  two  matrices  below,  AB.  Do  this  using  the  definitions 
of  the  matrix-vector  product  (Definition  MVP)  and  the  definition  of  matrix  multiplication 
(Definition  MM). 


A=* 


2 

-1 

2 


5 

3 

-2 


B = 


1 5 

2 0 


-3  4 

2 -3 


C21'  Compute  the  product  AB  of  the  two  matrices  below  using  both  the  definition  of 
the  matrix-vector  product  (Definition  MVP)  and  the  definition  of  matrix  multiplication 
(Definition  MM). 


' 1 

3 

2 

4 

1 

2 

-1 

2 

1 

B = 

1 

0 

1 

0 

1 

0 

3 

1 

5 

C22'  Compute  the  product  AB  of  the  two  matrices  below  using  both  the  definition  of 
the  matrix-vector  product  (Definition  MVP)  and  the  definition  of  matrix  multiplication 
(Definition  MM). 


A = 


1 

-2 


0 

1 


3 

6 


C23^  Compute  the  product  AB  of  the  two  matrices  below  using  both  the  definition  of 
the  matrix-vector  product  (Definition  MVP)  and  the  definition  of  matrix  multiplication 
(Definition  MM). 


'3 

r 

2 

4 

B = 

'-3  1 

6 

5 

4 2 

1 

2 
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C24?  Compute  the  product  AB  of  the  two  matrices  below. 


'1 

2 

3 

-2 

'3' 

A 

0 

1 

-2 

-1 

B = 

4 

0 

.2. 

1 

1 

3 

1 

Compute  the  product 

AB  of 

the  two  matrices  below. 

~—T 

'l 

2 

3 

-2 

q 

A = 

0 

1 

-2 

-1 

B = 

0 

1 

1 

1 

3 

1 

1 

1 

C26^  Compute  the  product  AB  of  the  two  matrices  below  using  both  the  definition  of 
the  matrix-vector  product  (Definition  MVP)  and  the  definition  of  matrix  multiplication 
(Definition  MM). 


"1 

3 

1' 

' 2 

-5 

-l' 

0 

1 

0 

B = 

0 

1 

0 

1 

1 

2 

-1 

2 

1 

C30'  For  the  matrix  A = 
positive  integer  n. 


1 , find  A2,  A3,  A4.  Find  a general  formula  for  An  for  any 


C31'  For  the  matrix  A = 
any  positive  integer  n. 


1 , find  A2,  A3,  A4.  Find  a general  formula  for  An  for 


C32'  For  the  matrix  A = 
for  any  positive  integer  n. 


1 

0 

0 


0 

2 

0 


0 

0 

3 


find  A2,  A3,  A4.  Find  a general  formula  for  An 


C33f 


For  the  matrix  A = 


0 

0 

0 


1 

0 

0 


2 

1 

0 


find  A2,  A3,  A4.  Find  a general  formula  for  An 


for  any  positive  integer  n. 

TICL  Suppose  that  A is  a square  matrix  and  there  is  a vector,  b,  such  that  £S(A , b)  has 
a unique  solution.  Prove  that  A is  nonsingular.  Give  a direct  proof  (perhaps  appealing  to 
Theorem  PSPHS)  rather  than  just  negating  a sentence  from  the  text  discussing  a similar 
situation. 


T12  The  conclusion  of  Theorem  HMIP  is  (Ax,  y)  = (x,  A*y).  Use  the  same  hypotheses, 
and  prove  the  similar  conclusion:  (x,  Ay)  = (A*x,  y).  Two  different  approaches  can  be 
based  on  an  application  of  Theorem  HMIP.  The  first  uses  Theorem  A A,  while  a second 
uses  Theorem  IPAC.  Can  you  provide  two  proofs? 

T20  Prove  the  second  part  of  Theorem  MMZM. 

T21  Prove  the  second  part  of  Theorem  MMIM. 

T22  Prove  the  second  part  of  Theorem  MMDAA. 
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T23^  Prove  the  second  part  of  Theorem  MMSMM. 

T31  Suppose  that  A is  an  m x n matrix  and  x,  y £ Af(A).  Prove  that  x + y € Af(A). 
T32  Suppose  that  A is  an  m x n matrix,  a G C,  and  x € J\f(A).  Prove  that  ax  € J\f(A). 
T35  Suppose  that  A is  an  n x n matrix.  Prove  that  A*  A and  AA*  are  Hermitian  matrices. 
T4(F  Suppose  that  A is  an  m x n matrix  and  B is  an  n x p matrix.  Prove  that  the  null 
space  of  A is  a subset  of  the  null  space  of  AB,  that  is  Af(B)  C A f(AB).  Provide  an  example 
where  the  opposite  is  false,  in  other  words  give  an  example  where  Af(AB)  % Af{B). 

T41 ' Suppose  that  A is  an  n x n nonsingular  matrix  and  B is  an  n x p matrix.  Prove  that 
the  null  space  of  B is  equal  to  the  null  space  of  AB,  that  is  Af{B)  = A f(AB).  (Compare 
with  Exercise  MM.T40.) 

T50  Suppose  u and  v are  any  two  solutions  of  the  linear  system  CS(A,  b).  Prove  that 
u — v is  an  element  of  the  null  space  of  A,  that  is,  u — v 6 AT  (A). 

T511  Give  a new  proof  of  Theorem  PSPHS  replacing  applications  of  Theorem  SLSLC 
with  matrix- vector  products  (Theorem  SLEMM). 

T52^  Suppose  that  x,  y £ Cn,  b £ Cm  and  A is  an  m x n matrix.  If  x,  y and  x + y are 
each  a solution  to  the  linear  system  CS(A,  b),  what  can  you  say  that  is  interesting  about 
b?  Form  an  implication  with  the  existence  of  the  three  solutions  as  the  hypothesis  and  an 
interesting  statement  about  CS(A,  b)  as  the  conclusion,  and  then  give  a proof. 


Section  MISLE 

Matrix  Inverses  and  Systems  of  Linear  Equations 

The  inverse  of  a square  matrix,  and  solutions  to  linear  systems  with  square  coefficient 
matrices,  are  intimately  connected. 


Subsection  SI 
Solutions  and  Inverses 


We  begin  with  a familiar  example,  performed  in  a novel  way. 

Example  SABMI  Solutions  to  Archetype  B with  a matrix  inverse 
Archetype  B is  the  system  of  m = 3 linear  equations  in  n = 3 variables, 

—7x\  — 6x2  — 12x3  = —33 
5xi  + 5x2  + 7x3  = 24 
x\  + 4x3  = 5 

By  Theorem  SLEMM  we  can  represent  this  system  of  equations  as 

Ax  = b 


where 


A = 

-7 

5 

-6 

5 

-12" 

7 

X = 

~x{ 

X2 

b = 

-33- 

24 

. 1 

0 

4 

X3. 

5 

Now,  entirely  unmotivated,  we  define  the  3x3  matrix  B , 


-10  -12 


and  note  the  remarkable  fact  that 


Now  apply  this  computation 
x = J3x 
= (BA)x 
= B{Ax) 

= Bb 


-7 

-6 

—12" 

"1 

0 

0" 

5 

5 

7 

= 

0 

1 

0 

1 

0 

4 

0 

0 

1 

to  the  problem  of  solving  the  system  of  equations, 
Theorem  MMIM 
Substitution 
Theorem  MMA 
Substitution 
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So  we  have 


x = Bb  = 


-10  -12  -9 

13  o 11 


--33- 

-3- 

24 

= 

5 

5 

2 

2 2 

So  with  the  help  and  assistance  of  B we  have  been  able  to  determine  a solution 
to  the  system  represented  by  Ax  = b through  judicious  use  of  matrix  multiplication. 
We  know  by  Theorem  NMUS  that  since  the  coefficient  matrix  in  this  example  is 
nonsingular,  there  would  be  a unique  solution,  no  matter  what  the  choice  of  b.  The 
derivation  above  amplifies  this  result,  since  we  were  forced  to  conclude  that  x = Bh 
and  the  solution  could  not  be  anything  else.  You  should  notice  that  this  argument 
would  hold  for  any  particular  choice  of  b.  A 

The  matrix  B of  the  previous  example  is  called  the  inverse  of  A.  When  A and  B 
are  combined  via  matrix  multiplication,  the  result  is  the  identity  matrix,  which  can 
be  inserted  “in  front”  of  x as  the  first  step  in  finding  the  solution.  This  is  entirely 
analogous  to  how  we  might  solve  a single  linear  equation  like  3x  = 12. 

I i I r = „„ 

3 3 


x = lx  = ( 3 (3) 


X = l (3x)  = \ (12)  = 4 


Here  we  have  obtained  a solution  by  employing  the  “multiplicative  inverse”  of 
3,  3_1  = This  works  fine  for  any  scalar  multiple  of  x,  except  for  zero,  since  zero 
does  not  have  a multiplicative  inverse.  Consider  separately  the  two  linear  equations, 

Ox  = 12  Ox  = 0 

The  first  has  no  solutions,  while  the  second  has  infinitely  many  solutions.  For 
matrices,  it  is  all  just  a little  more  complicated.  Some  matrices  have  inverses,  some 
do  not.  And  when  a matrix  does  have  an  inverse,  just  how  would  we  compute  it? 
In  other  words,  just  where  did  that  matrix  B in  the  last  example  come  from?  Are 
there  other  matrices  that  might  have  worked  just  as  well? 


Subsection  IM 
Inverse  of  a Matrix 

Definition  MI  Matrix  Inverse 

Suppose  A and  B are  square  matrices  of  size  n such  that  AB  = In  and  BA  = In. 
Then  A is  invertible  and  B is  the  inverse  of  A.  In  this  situation,  we  write  B = A-1. 
□ 

Notice  that  if  B is  the  inverse  of  A,  then  we  can  just  as  easily  say  A is  the  inverse 
of  B,  or  A and  B are  inverses  of  each  other. 

Not  every  square  matrix  has  an  inverse.  In  Example  SABMI  the  matrix  B is 
the  inverse  of  the  coefficient  matrix  of  Archetype  B.  To  see  this  it  only  remains  to 
check  that  AB  = 1 3.  What  about  Archetype  A?  It  is  an  example  of  a square  matrix 
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without  an  inverse. 


Example  MWIAA  A matrix  without  an  inverse,  Archetype  A 
Consider  the  coefficient  matrix  from  Archetype  A, 


A = 


T 

2 

1 


-1 

1 

1 


2‘ 

1 

0 


Suppose  that  A is  invertible  and  does  have  an  inverse,  say  B.  Choose  the  vector 
of  constants 


b = 


T 

3 

2 


and  consider  the  system  of  equations  CS(A:  b).  Just  as  in  Example  SABMI,  this 
vector  equation  would  have  the  unique  solution  x = Bh. 

However,  the  system  CS{A , b)  is  inconsistent.  Form  the  augmented  matrix 
[ A | b]  and  row-reduce  to 

-0  0 10- 

0 0-10 

. 0 0 0 0 

which  allows  us  to  recognize  the  inconsistency  by  Theorem  RCLS. 

So  the  assumption  of  A’s  inverse  leads  to  a logical  inconsistency  (the  system 
cannot  be  both  consistent  and  inconsistent),  so  our  assumption  is  false.  A is  not 
invertible. 

It  is  possible  this  example  is  less  than  satisfying.  Just  where  did  that  particular 
choice  of  the  vector  b come  from  anyway?  Stay  tuned  for  an  application  of  the  future 
Theorem  CSCS  in  Example  CSAA.  A 


Let  us  look  at  one  more  matrix  inverse  before  we  embark  on  a more  systematic 
study. 

Example  MI  Matrix  inverse 
Consider  the  matrices, 


' 1 

2 

1 

2 

1 ' 

-3 

3 

6 

-1 

-2" 

-2 

-3 

0 

-5 

-1 

0 

-2 

-5 

-1 

1 

A = 

1 

1 

0 

2 

1 

B = 

1 

2 

4 

1 

-1 

-2 

-3 

-1 

-3 

-2 

1 

0 

1 

1 

0 

-1 

-3 

-1 

-3 

1 

1 

-1 

-2 

0 

1 
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Then 


' 1 

2 

1 

2 

1 ' 

'—3 

3 

6 

-1 

-2' 

T 

0 

0 

0 

O' 

-2 

-3 

0 

-5 

-1 

0 

-2 

—5 

-1 

1 

0 

1 

0 

0 

0 

AB  = 

1 

1 

0 

2 

1 

1 

2 

4 

1 

-1 

= 

0 

0 

1 

0 

0 

-2 

-3 

-1 

-3 

—2 

1 

0 

1 

1 

0 

0 

0 

0 

1 

0 

-1 

-3 

-1 

-3 

1 

_ 1 

-1 

-2 

0 

1 

_0 

0 

0 

0 

1 

and 

—3 

3 

6 

-1 

-2' 

’ 1 

2 

1 

2 

1 ' 

T 

0 

0 

0 

O' 

0 

-2 

—5 

-1 

1 

-2 

-3 

0 

-5 

-1 

0 

1 

0 

0 

0 

BA  = 

1 

2 

4 

1 

-1 

1 

1 

0 

2 

1 

= 

0 

0 

1 

0 

0 

1 

0 

1 

1 

0 

-2 

-3 

-1 

-3 

-2 

0 

0 

0 

1 

0 

1 

-1 

-2 

0 

1 

-1 

-3 

-1 

-3 

1 

0 

0 

0 

0 

1 

so  by  Definition  MI,  we  can  say  that  A is  invertible  and  write  B = A 1 . A 


We  will  now  concern  ourselves  less  with  whether  or  not  an  inverse  of  a matrix 
exists,  but  instead  with  how  you  can  find  one  when  it  does  exist.  In  Section  MINM 
we  will  have  some  theorems  that  allow  us  to  more  quickly  and  easily  determine  just 
when  a matrix  is  invertible. 


Subsection  CIM 

Computing  the  Inverse  of  a Matrix 


We  have  seen  that  the  matrices  from  Archetype  B and  Archetype  K both  have 
inverses,  but  these  inverse  matrices  have  just  dropped  from  the  sky.  How  would 
we  compute  an  inverse?  And  just  when  is  a matrix  invertible,  and  when  is  it  not? 
Writing  a putative  inverse  with  n2  unknowns  and  solving  the  resultant  n2  equations 
is  one  approach.  Applying  this  approach  to  2 x 2 matrices  can  get  us  somewhere,  so 
just  for  fun,  let  us  do  it. 


Theorem  TTMI  Two-by-Two  Matrix  Inverse 
Suppose 


Then  A is  invertible  if  and  only  if  ad  — be  7^  0.  When  A is  invertible,  then 


A-1 


1 

ad  — be 


d 

— c 


-b 

a 


Proof.  (4=)  Assume  that  ad  — be  ^ 0.  We  will  use  the  definition  of  the  inverse  of  a 
matrix  to  establish  that  A has  an  inverse  (Definition  MI).  Note  that  if  ad  — be  7^  0 
then  the  displayed  formula  for  A is  legitimate  since  we  are  not  dividing  by  zero). 
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Using  this  proposed  formula  for  the  inverse  of  A,  we  compute 


a b 

( 1 

' d 

- b 

\ 1 

ad  — be 

0 

'1 

O' 

c d 

\ad  — be 

—c 

a 

) ad  — be 

0 

ad  — be 

0 

1 

and 


1 

' d 

-b 

a b 

1 

ad  — be 

0 

'l 

o' 

ad  — be 

—c 

a 

c d 

ad  — be 

0 

ad  — be 

0 

1 

By  Definition  MI  this  is  sufficient  to  establish  that  A is  invertible,  and  that  the 
expression  for  A-1  is  correct. 

(=>)  Assume  that  A is  invertible,  and  proceed  with  a proof  by  contradiction 
(Proof  Technique  CD),  by  assuming  also  that  ad  — bc  = 0.  This  translates  to  ad  = be. 
Let 


be  a putative  inverse  of  A. 
This  means  that 


/ 

h 


a b 

e / 

ae  + bg 

af  + bh 

c d 

9 h 

ce  + dg 

cf  + dh 

Working  on  the  matrices  on  two  ends  of  this  equation,  we  will  multiply  the  top 
row  by  c and  the  bottom  row  by  a. 


c O' 

ace  + beg 

acf  + bch 

0 a 

ace  + adg 

acf  + adh 

We  are  assuming  that  ad  = be,  so  we  can  replace  two  occurrences  of  ad  by  be  in 
the  bottom  row  of  the  right  matrix. 


c O' 

ace  + beg  acf  + bch 

0 a 

ace  + beg  acf  + bch 

The  matrix  on  the  right  now  has  two  rows  that  are  identical,  and  therefore  the 
same  must  be  true  of  the  matrix  on  the  left.  Identical  rows  for  the  matrix  on  the 
left  implies  that  a = 0 and  c = 0. 

With  this  information,  the  product  AB  becomes 


1 

0 


0 

1 


= i2  = ab  = 


ae  + bg 
ce  + dg 


af  + bh 
cf  4-  dh 


bg 

bh 

dg 

dh 

So  bg  = dh  = 1 and  thus  b,  g,  d,  h are  all  nonzero.  But  then  bh  and  dg  (the  “other 
corners”)  must  also  be  nonzero,  so  this  is  (finally)  a contradiction.  So  our  assumption 
was  false  and  we  see  that  ad  — be  0 whenever  A has  an  inverse.  ■ 


There  are  several  ways  one  could  try  to  prove  this  theorem,  but  there  is  a continual 
temptation  to  divide  by  one  of  the  eight  entries  involved  (a  through  /),  but  we  can 
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never  be  sure  if  these  numbers  are  zero  or  not.  This  could  lead  to  an  analysis  by 
cases,  which  is  messy,  messy,  messy.  Note  how  the  above  proof  never  divides,  but 
always  multiplies,  and  how  zero/nonzero  considerations  are  handled.  Pay  attention 
to  the  expression  ad  — be,  as  we  will  see  it  again  in  a while  (Chapter  D). 

This  theorem  is  cute,  and  it  is  nice  to  have  a formula  for  the  inverse,  and  a condition 
that  tells  us  when  we  can  use  it.  However,  this  approach  becomes  impractical  for 
larger  matrices,  even  though  it  is  possible  to  demonstrate  that,  in  theory,  there  is 
a general  formula.  (Think  for  a minute  about  extending  this  result  to  just  3x3 
matrices.  For  starters,  we  need  18  letters!)  Instead,  we  will  work  column-by-column. 
Let  us  first  work  an  example  that  will  motivate  the  main  theorem  and  remove  some 
of  the  previous  mystery. 

Example  CMI  Computing  a matrix  inverse 
Consider  the  matrix  defined  in  Example  MI  as, 


' 1 

2 

1 

2 

1 ' 

-2 

—3 

0 

-5 

-1 

A = 

1 

1 

0 

2 

1 

-2 

—3 

-1 

-3 

-2 

-1 

-3 

-1 

-3 

1 

For  its  inverse,  we  desire  a matrix  B so  that  AB  = 1 5.  Emphasizing  the  structure 
of  the  columns  and  employing  the  definition  of  matrix  multiplication  Definition  MM, 

AB  = h 

A[Bi|B2|B3|B4|B5]  = [e1|e2|e3|e4|e5] 
[ABi|AB2|AB3|AB4|AB5]  = [ei|e2|e3|e4|e5] 

Equating  the  matrices  column-by-column  we  have 
ABi  = ex  AB2  = e2  AB3  = e3  AB4  = e4  AB5  = e3. 

Since  the  matrix  B is  what  we  are  trying  to  compute,  we  can  view  each  column, 
B8,  as  a column  vector  of  unknowns.  Then  we  have  five  systems  of  equations  to 
solve,  each  with  5 equations  in  5 variables.  Notice  that  all  5 of  these  systems  have 
the  same  coefficient  matrix.  We  will  now  solve  each  system  in  turn, 
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Row-reduce  the  augmented  matrix  of  the  linear  system  CS (A,  ei), 


' 1 2 1 2 11" 

"0o  0 0 0-3" 

"-3" 

-2  —3  0 -5-10 

0 0 0 0 0 0 

0 

110  2 10 

RREF 
> 

0 0 0 0 0 1 

CS 

II 

1 

-2  -3  -1  -3  -2  0 

0 0 0 0 0 1 

1 

-1  -3  -1-3  10 

_ 0 0 0 0 0 1 _ 

1 

Row-reduce  the  augmented  matrix  of  the  linear  system  CS(A,  e2), 


' 1 2 1 2 10" 

"0o  0 0 0 3 " 

' 3 ' 

-2  -3  0 -5-11 

0 0 0 0 0 -2 

-2 

110  2 10 

RREF 
> 

0 0 0 0 0 2 

;B2  = 

2 

-2  -3  -1  -3  -2  0 

0 0 0 0 0 0 

0 

-1  -3  -1-3  10 

_ 0 0 0 0 0 -1_ 

-1 

Row-reduce  the  augmented  matrix  of  the  linear  system  CS(A,  63), 


' 1 2 1 2 10" 

"0  0 0 0 0 6 " 

' 6 ' 

-2  -3  0 -5-10 

0 0 0 0 0 -5 

-5 

110  2 11 

RREF 

y 

0 0 0 0 0 4 

;B3  = 

4 

-2  -3  -1  -3  -2  0 

0 0 0 0 0 1 

1 

-1  -3  -1-3  10 

_ 0 0 0 0 0 -2. 

-2 

Row-reduce  the  augmented  matrix  of  the  linear  system  CS  (A,  64), 


' 1 2 1 2 10" 

"0  0 0 0 0-1" 

"-1" 

-2  -3  0 -5-10 

0 0 0 0 0 -1 

-1 

110  2 10 

RREF 

y 

0 0 0 0 0 1 

; B4  = 

1 

-2  -3  -1  -3  -2  1 

0 0 0 0 0 1 

1 

-1  -3  -1-3  10 

_ 0 0 0 0 0 0 _ 

0 

Row-reduce  the  augmented  matrix  of  the  linear  system  CS(A,  es), 


' 1 2 1 2 10" 

"0  0 0 0 0-2" 

"-2" 

-2  -3  0 -5-10 

0 0 0 0 0 1 

1 

110  2 10 

RREF 

y 

0 0 0 0 0 -1 

;B5  = 

-1 

-2  -3  -1  -3  -2  0 

0 0 0 0 0 0 

0 

-1  -3  -1-3  11 

_ 0 0 0 0 0 1 _ 

1 
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We  can  now  collect  our  5 solution  vectors  into  the  matrix  B, 

B =[B1|B2|B3|B4|B5] 


By  this  method,  we  know  that  AB  = J5.  Check  that  BA  = 1$,  and  then  we  will 
know  that  we  have  the  inverse  of  A.  A 

Notice  how  the  five  systems  of  equations  in  the  preceding  example  were  all  solved 
by  exactly  the  same  sequence  of  row  operations.  Would  it  not  be  nice  to  avoid  this 
obvious  duplication  of  effort?  Our  main  theorem  for  this  section  follows,  and  it 
mimics  this  previous  example,  while  also  avoiding  all  the  overhead. 

Theorem  CINM  Computing  the  Inverse  of  a Nonsingular  Matrix 
Suppose  A is  a nonsingular  square  matrix  of  size  n.  Create  the  n x 2n  matrix  M by 
placing  the  n x n identity  matrix  In  to  the  right  of  the  matrix  A.  Let  N be  a matrix 
that  is  row-equivalent  to  M and  in  reduced  row-echelon  form.  Finally,  let  J be  the 
matrix  formed  from  the  final  n columns  of  N . Then  AJ  = In. 

Proof.  A is  nonsingular,  so  by  Theorem  NMRRI  there  is  a sequence  of  row  operations 
that  will  convert  A into  In.  It  is  this  same  sequence  of  row  operations  that  will 
convert  M into  N,  since  having  the  identity  matrix  in  the  first  n columns  of  N is 
sufficient  to  guarantee  that  N is  in  reduced  row-echelon  form. 

If  we  consider  the  systems  of  linear  equations,  CS{A,  e,),  1 < i < n,  we  see  that 
the  aforementioned  sequence  of  row  operations  will  also  bring  the  augmented  matrix 
of  each  of  these  systems  into  reduced  row-echelon  form.  Furthermore,  the  unique 
solution  to  CS(A , e.;)  appears  in  column  n + 1 of  the  row-reduced  augmented  matrix 
of  the  system  and  is  identical  to  column  n+i  of  N.  Let  Ni,  N2,  N3,  ...,  N2n  denote 
the  columns  of  N.  So  we  find, 

AJ  =A[N„+1|N„+2|N„+3| . . . |N,l+n] 

= [AN„+ 1 1 AN„+2 1 ANn+3 1 ...  | ANn+„]  Definition  MM 

= [el  I e2  | e3  I • • ■ |en] 

=In  Definition  IM 


as  desired. 
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We  have  to  be  just  a bit  careful  here  about  both  what  this  theorem  says  and 
what  it  does  not  say.  If  A is  a nonsingular  matrix,  then  we  are  guaranteed  a matrix 
B such  that  AB  = In,  and  the  proof  gives  us  a process  for  constructing  B.  However, 
the  definition  of  the  inverse  of  a matrix  (Definition  MI)  requires  that  BA  = In  also. 
So  at  this  juncture  we  must  compute  the  matrix  product  in  the  “opposite”  order 
before  we  claim  B as  the  inverse  of  A.  However,  we  will  soon  see  that  this  is  always 
the  case,  in  Theorem  OSIS,  so  the  title  of  this  theorem  is  not  inaccurate. 

What  if  A is  singular?  At  this  point  we  only  know  that  Theorem  CINM  cannot 
be  applied.  The  question  of  A’s  inverse  is  still  open.  (But  see  Theorem  NI  in  the 
next  section.) 

We  will  finish  by  computing  the  inverse  for  the  coefficient  matrix  of  Archetype 
B,  the  one  we  just  pulled  from  a hat  in  Example  SABMI.  There  are  more  examples 
in  the  Archetypes  (Archetypes)  to  practice  with,  though  notice  that  it  is  silly  to  ask 
for  the  inverse  of  a rectangular  matrix  (the  sizes  are  not  right)  and  not  every  square 
matrix  has  an  inverse  (remember  Example  MWIAA?). 


Example  CMIAB  Computing  a matrix  inverse,  Archetype  B 
Archetype  B has  a coefficient  matrix  given  as 


B = 


'-7 

5 

1 


-6 

5 

0 


-12' 

7 

4 


Exercising  Theorem  CINM  we  set 


M = 


--7 

5 

1 


-6 

5 

0 


-12  1 0 O' 

7 0 10 

4 0 0 1 


which  row  reduces  to 


N = 


1 0 0 -10  -12  -9 
0 10 
0 0 1 


13 


11 


2 J 


So 


B-1 


-12 

8 

3 


once  we  check  that  B lB  = I3  (the  product  in  the  opposite  order  is  a consequence 
of  the  theorem).  A 
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Subsection  PMI 
Properties  of  Matrix  Inverses 

The  inverse  of  a matrix  enjoys  some  nice  properties.  We  collect  a few  here.  First,  a 
matrix  can  have  but  one  inverse. 

Theorem  MIU  Matrix  Inverse  is  Unique 

Suppose  the  square  matrix  A has  an  inverse.  Then  A-1  is  unique. 


Proof.  As  described  in  Proof  Technique  t 
The  hypothesis  tells  there  is  at  least  on 
inverses  for  A,  so  we  know  by  Definition  M. 
Then  we  have, 

B = BIn 
= B(AC) 

= ( BA)C 
= InC 
= C 

So  we  conclude  that  B and  C are  tb 
matrix  that  acts  like  an  inverse,  must  be 


I,  we  will  assume  that  A has  two  inverses. 
3.  Suppose  then  that  B and  C are  both 
' that  AB  = BA  = In  and  AC  = CA  = In. 

Theorem  MMIM 
Definition  MI 
Theorem  MM  A 
Definition  MI 
Theorem  MMIM 

e same,  and  cannot  be  different.  So  any 
the  inverse.  ■ 


When  most  of  us  dress  in  the  morning,  we  put  on  our  socks  first,  followed  by  our 
shoes.  In  the  evening  we  must  then  first  remove  our  shoes,  followed  by  our  socks. 
Try  to  connect  the  conclusion  of  the  following  theorem  with  this  everyday  example. 

Theorem  SS  Socks  and  Shoes 

Suppose  A and  B are  invertible  matrices  of  size  n.  Then  AB  is  an  invertible  matrix 
and  ( AB )_1  = B~1A~1. 

Proof.  At  the  risk  of  carrying  our  everyday  analogies  too  far,  the  proof  of  this 
theorem  is  quite  easy  when  we  compare  it  to  the  workings  of  a dating  service.  We 
have  a statement  about  the  inverse  of  the  matrix  AB,  which  for  all  we  know  right 
now  might  not  even  exist.  Suppose  AB  was  to  sign  up  for  a dating  service  with 
two  requirements  for  a compatible  date.  Upon  multiplication  on  the  left,  and  on 
the  right,  the  result  should  be  the  identity  matrix.  In  other  words,  AB' s ideal  date 
would  be  its  inverse. 

Now  along  comes  the  matrix  B~1A~1  (which  we  know  exists  because  our  hy- 
pothesis says  both  A and  B are  invertible  and  we  can  form  the  product  of  these 
two  matrices),  also  looking  for  a date.  Let  us  see  if  B~1A~1  is  a good  match  for 
AB.  First  they  meet  at  a noncommittal  neutral  location,  say  a coffee  shop,  for  quiet 
conversation: 

(B~1A~1)(AB)  = B~1(A~1A)B 


Theorem  MMA 
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= B~lInB 
= B~XB 
= Ir 


Definition  MI 
Theorem  MMIM 
Definition  MI 


The  first  date  having  gone  smoothly,  a second,  more  serious,  date  is  arranged,  say 
dinner  and  a show: 


(AB)(B~1A~1)  = A(BB~1)A~1 
= AInA~Y 
= AA-1 

= I r, 


Theorem  MMA 
Dehnition  MI 
Theorem  MMIM 
Dehnition  MI 


So  the  matrix  B 1 A 1 has  met  all  of  the  requirements  to  be  AB’s  inverse  (date) 
and  with  the  ensuing  marriage  proposal  we  can  announce  that  (A_B)_1  = B~1A~l. 


Theorem  MIMI  Matrix  Inverse  of  a Matrix  Inverse 

Suppose  A is  an  invertible  matrix.  Then  A^1  is  invertible  and  (A^1)-1  = A. 

Proof.  As  with  the  proof  of  Theorem  SS,  we  examine  if  A is  a suitable  inverse  for 
A-1  (by  dehnition,  the  opposite  is  true). 

AA-1  = In  Dehnition  MI 


and 


A-1  A = In 


Dehnition  MI 


The  matrix  A has  met  all  the  requirements  to  be  the  inverse  of  A 1 , and  so  is 
invertible  and  we  can  write  A = (A-1)-1.  ■ 

Theorem  MIT  Matrix  Inverse  of  a Transpose 

Suppose  A is  an  invertible  matrix.  Then  A*  is  invertible  and  (A4)-1  = (A-1)4. 

Proof.  As  with  the  proof  of  Theorem  SS,  we  see  if  (A-1)4  is  a suitable  inverse  for 
A4.  Apply  Theorem  MMT  to  see  that 

(A_1)4A4  = (AA-1)4  Theorem  MMT 

Tt  Dehnition  MI 


and 


= T 

Jn 
= In 

A4(A-1)4  = (A^A)4 
= 74 


Dehnition  SYM 

Theorem  MMT 
Dehnition  MI 
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= In  Definition  SYM 

The  matrix  (A-1)*  has  met  all  the  requirements  to  be  the  inverse  of  A *,  and  so 
is  invertible  and  we  can  write  (A*)-1  = (A-1)4.  ■ 

Theorem  MISM  Matrix  Inverse  of  a Scalar  Multiple 

Suppose  A is  an  invertible  matrix  and  a is  a nonzero  scalar.  Then  (aA)^1  = ^A-1 
and  aA  is  invertible. 


Proof.  As  with  the  proof  of  Theorem  SS,  we  see  if  —A  1 is  a suitable  inverse  for  a A. 


A ^ (a A)  = ^— (A  1A)  Theorem 


MMSMM 


= 1 In 
= In 


Scalar  multiplicative  inverses 
Property  OM 


and 

(0.4)  ( A-) 


(oi)  (AA-) 

1 In 
In 


Theorem  MMSMM 

Scalar  multiplicative  inverses 
Property  OM 


The  matrix  - A 1 

a 

can  write  (aA)  = — 


has  met  all  the  requirements  to  be  the  inverse  of  aA,  so  we 
A-1.  ■ 


Notice  that  there  are  some  likely  theorems  that  are  missing  here.  For  example,  it 
would  be  tempting  to  think  that  (A  + B)_1  = A-1  + B_1,  but  this  is  false.  Can  you 
find  a counterexample?  (See  Exercise  MISLE.T10.) 


Reading  Questions 


1.  Compute  the  inverse  of  the  matrix  below. 

'-2  3' 

-3  4 

2.  Compute  the  inverse  of  the  matrix  below. 

'2  3 1 ' 

1 -2  -3 

-2  4 6 

3.  Explain  why  Theorem  SS  has  the  title  it  does.  (Do  not  just  state  the  theorem,  explain 
the  choice  of  the  title  making  reference  to  the  theorem  itself.) 
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Exercises 

016^  If  it  exists,  find  the  inverse  of  A = 

017^  If  it  exists,  find  the  inverse  of  A = 

(218^  If  it  exists,  find  the  inverse  of  A = 

019^  If  it  exists,  find  the  inverse  of  A = 

C21'  Verify  that  B is  the  inverse  of  A. 


1 

1 

2 

2 

1 

3 

1 

1 

2 

1 

0 

2 


0 

1 

-1 

-1 

2 

1 

3 

2 

2 

3 

2 

2 


1 

1 

1 

1 

1 

2 


1 

1 , 
1 

1 

1 , 
1 


, and  check  your  answer. 


, and  check  your  answer. 


and  check  your  answer. 


and  check  your  answer. 


■ 1 

1 

-1 

2 ' 

' 4 

2 

0 

-r 

-2 

-1 

2 

-3 

B = 

8 

4 

-1 

-1 

1 

1 

0 

2 

-1 

0 

1 

0 

-1 

2 

0 

2 

-6 

-3 

1 

1 

C22'  Recycle  the  matrices  A and  B from  Exercise  MISLE.C21  and  set 


' 2 ' 
1 

-3 

2 


d = 


Employ  the  matrix  B to  solve  the  two  linear  systems  £<S(A,  c)  and  CS(A,  d). 
C23  If  it  exists,  find  the  inverse  of  the  2x2  matrix 


A = 


7 3 
5 2 


and  check  your  answer.  (See  Theorem  TTMI.) 

C24  If  it  exists,  find  the  inverse  of  the  2x2  matrix 


A = 


6 3 
4 2 


and  check  your  answer.  (See  Theorem  TTMI.) 

C25  At  the  conclusion  of  Example  CMI,  verify  that  BA  = I5  by  computing  the  matrix 
product. 

C26f  Let 


' 1 

-1 

3 

-2 

r 

-2 

3 

-5 

3 

0 

D = 

1 

-1 

4 

-2 

2 

-1 

4 

-1 

0 

4 

1 

0 

5 

-2 

5 
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Compute  the  inverse  of  D,  D \ by  forming  the  5 x 10  matrix  [ D A]  and  row-reducing 
(Theorem  CINM).  Then  use  a calculator  to  compute  D~1  directly. 

C27f  Let 


' 1 

-1 

3 

-2 

1 ‘ 

-2 

3 

-5 

3 

-1 

E = 

1 

-1 

4 

-2 

2 

-1 

4 

-1 

0 

2 

1 

0 

5 

-2 

4 

Compute  the  inverse  of  E,  E 1 , by  forming  the  5 x 10  matrix  [E  | A]  and  row-reducing 
(Theorem  CINM).  Then  use  a calculator  to  compute  A-1  directly. 

C28f  Let 


- 1 

1 

3 

1 ' 

C = 

-2 

-1 

-4 

-1 

1 

4 

10 

2 

-2 

0 

-4 

5 

Compute  the  inverse  of  C,  C 1 . by  forming  the  4x8  matrix  [C\  A]  and  row-reducing 
(Theorem  CINM).  Then  use  a calculator  to  compute  C-1  directly. 

C40'  Find  all  solutions  to  the  system  of  equations  below,  making  use  of  the  matrix  inverse 
found  in  Exercise  MISLE.C28. 


*1  + *2  + 3X3  + *4  = — 4 

—2*i  — *2  — 4*3  — *4=4 
*i  + 4*2  + 10*3  + 2*4  = —20 
—2*i  — 4*3  + 5*4  = 9 

C41'  Use  the  inverse  of  a matrix  to  find  all  the  solutions  to  the  following  system  of 
equations. 

*i  + 2*2  — *3  = — 3 
2*i  + 5*2  — *3  = — 4 
— *i  — 4*2  = 2 

C42f  Use  a matrix  inverse  to  solve  the  linear  system  of  equations. 

*i  — *2  + 2*3  = 5 

*i  — 2*3  = —8 

2*i  — *2  — *3  = —6 


TIO1^  Construct  an  example  to  demonstrate  that  [A  + B)  1 = A 1 + B 1 is  not  true  for 
all  square  matrices  A and  B of  the  same  size. 


Section  MINM 

Matrix  Inverses  and  Nonsingular  Matrices 


We  saw  in  Theorem  CINM  that  if  a square  matrix  A is  nonsingular,  then  there 
is  a matrix  B so  that  AB  = In.  In  other  words,  B is  halfway  to  being  an  inverse 
of  A.  We  will  see  in  this  section  that  B automatically  fulfills  the  second  condition 
(BA  = In).  Example  MWIAA  showed  us  that  the  coefficient  matrix  from  Archetype 
A had  no  inverse.  Not  coincidentally,  this  coefficient  matrix  is  singular.  We  will  make 
all  these  connections  precise  now.  Not  many  examples  or  definitions  in  this  section, 
just  theorems. 

Subsection  NMI 

Nonsingular  Matrices  are  Invertible 

We  need  a couple  of  technical  results  for  starters.  Some  books  would  call  these  minor, 
but  essential,  results  “lemmas.”  We’ll  just  call  ’em  theorems.  See  Proof  Technique 
LC  for  more  on  the  distinction. 

The  first  of  these  technical  results  is  interesting  in  that  the  hypothesis  says 
something  about  a product  of  two  square  matrices  and  the  conclusion  then  says  the 
same  thing  about  each  individual  matrix  in  the  product.  This  result  has  an  analogy 
in  the  algebra  of  complex  numbers:  suppose  a,  /3  £ C,  then  a/3  7^  0 if  and  only  if 
ct/0  and  (3  7^  0.  We  can  view  this  result  as  suggesting  that  the  term  “nonsingular” 
for  matrices  is  like  the  term  “nonzero”  for  scalars.  Consider  too  that  we  know 
singular  matrices,  as  coefficient  matrices  for  systems  of  equations,  will  sometimes 
lead  to  systems  with  no  solutions,  or  systems  with  infinitely  many  solutions  (Theorem 
NMUS).  What  do  linear  equations  with  zero  look  like?  Consider  Ox  = 5,  which  has 
no  solution,  and  Ox  = 0,  which  has  infinitely  many  solutions.  In  the  algebra  of  scalars, 
zero  is  exceptional  (meaning  different,  not  better),  and  in  the  algebra  of  matrices, 
singular  matrices  are  also  the  exception.  While  there  is  only  one  zero  scalar,  and 
there  are  infinitely  many  singular  matrices,  we  will  see  that  singular  matrices  are  a 
distinct  minority. 

Theorem  NPNT  Nonsingular  Product  has  Nonsingular  Terms 

Suppose  that  A and  B are  square  matrices  of  size  n.  The  product  AB  is  nonsingular 

if  and  only  if  A and  B are  both  nonsingular. 

Proof.  (=>)  For  this  portion  of  the  proof  we  will  form  the  logically-equivalent  con- 
trapositive and  prove  that  statement  using  two  cases.  “ AB  is  nonsingular  implies  A 
and  B are  both  nonsingular”  becomes  “A  or  B is  singular  implies  AB  is  singular.” 
(Be  sure  to  undertstand  why  the  “and”  became  an  “or”,  see  Proof  Technique  CP.) 
Case  1.  Suppose  B is  singular.  Then  there  is  a nonzero  vector  z that  is  a solution 
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to  CS(B,  0).  So 

(AB)z  = A(Bz) 
= AO 


Theorem  MMA 
Theorem  SLEMM 


= 0 


Theorem  MMZM 


With  Theorem  SLEMM  we  can  translate  this  vector  equality  to  the  statement 
that  z is  a nonzero  solution  to  CS(AB , 0).  Thus  AB  is  singular  (Definition  NM),  as 
desired. 

Case  2.  Suppose  A is  singular,  and  B is  not  singular.  In  other  words,  with  Case 
1 complete,  we  can  be  more  precise  about  this  remaining  case  and  assume  that  B is 
nonsingular.  Because  A is  singular,  there  is  a nonzero  vector  y that  is  a solution 
to  CS(A,  0).  Now  consider  the  linear  system  CS(B,  y).  Since  B is  nonsingular,  the 
system  has  a unique  solution  (Theorem  NMUS),  which  we  will  denote  as  w.  We  first 
claim  w is  not  the  zero  vector  either.  Assuming  the  opposite,  suppose  that  w = 0 
(Proof  Technique  CD).  Then 

y = Bw  Theorem  SLEMM 

= BO  Hypothesis 

= 0 Theorem  MMZM 

contrary  to  y being  nonzero.  So  w / 0.  The  pieces  are  in  place,  so  here  we  go, 

(AB) w = A(Bw)  Theorem  MMA 

= Ay  Theorem  SLEMM 

= 0 Theorem  SLEMM 


With  Theorem  SLEMM  we  can  translate  this  vector  equality  to  the  statement 
that  w is  a nonzero  solution  to  CS(AB,  0).  Thus  AB  is  singular  (Definition  NM), 
as  desired.  And  this  conclusion  holds  for  both  cases. 

(<=)  Now  assume  that  both  A and  B are  nonsingular.  Suppose  that  x £ Cn  is  a 
solution  to  CS(AB,  0).  Then 

0 = (AB)  x Theorem  SLEMM 

= A (Bx)  Theorem  MMA 

By  Theorem  SLEMM,  Bx  is  a solution  to  CS(A,  0),  and  by  the  definition  of  a 
nonsingular  matrix  (Definition  NM),  we  conclude  that  Bx  = 0.  Now,  by  an  entirely 
similar  argument,  the  nonsingularity  of  B forces  us  to  conclude  that  x = 0.  So 
the  only  solution  to  CS(AB1  0)  is  the  zero  vector  and  we  conclude  that  AB  is 
nonsingular  by  Definition  NM.  ■ 
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This  is  a powerful  result  in  the  “forward”  direction,  because  it  allows  us  to  begin 
with  a hypothesis  that  something  complicated  (the  matrix  product  AB ) has  the 
property  of  being  nonsingular,  and  we  can  then  conclude  that  the  simpler  constituents 
(A  and  B individually)  then  also  have  the  property  of  being  nonsingular.  If  we  had 
thought  that  the  matrix  product  was  an  artificial  construction,  results  like  this  would 
make  us  begin  to  think  twice. 

The  contrapositive  of  this  entire  result  is  equally  interesting.  It  says  that  A or  B 
(or  both)  is  a singular  matrix  if  and  only  if  the  product  AB  is  singular.  (See  Proof 
Technique  CP.) 

Theorem  OSIS  One-Sided  Inverse  is  Sufficient 

Suppose  A and  B are  square  matrices  of  size  n such  that  AB  = In.  Then  BA  = In. 


Proof.  The  matrix  In  is  nonsingular  (since  it  row-reduces  easily  to  In , Theorem 
NMRRI).  So  A and  B are  nonsingular  by  Theorem  NPNT,  so  in  particular  B is 
nonsingular.  We  can  therefore  apply  Theorem  CINM  to  assert  the  existence  of  a 
matrix  C so  that  BC  = In.  This  application  of  Theorem  CINM  could  be  a bit 
confusing,  mostly  because  of  the  names  of  the  matrices  involved.  B is  nonsingular, 
so  there  must  be  a “right-inverse”  for  B , and  we  are  calling  it  C. 

Now 


BA  = {BA)In 
= ( BA)(BC ) 
= B(AB)C 
= BInC 
= BC 
= In 

which  is  the  desired  conclusion. 


Theorem  MMIM 
Theorem  CINM 
Theorem  MMA 
Hypothesis 
Theorem  MMIM 
Theorem  CINM 


So  Theorem  OSIS  tells  us  that  if  A is  nonsingular,  then  the  matrix  B guaranteed 
by  Theorem  CINM  will  be  both  a “right-inverse”  and  a “left-inverse”  for  A,  so  A is 
invertible  and  A-1  = B. 

So  if  you  have  a nonsingular  matrix,  A,  you  can  use  the  procedure  described 
in  Theorem  CINM  to  find  an  inverse  for  A.  If  A is  singular,  then  the  procedure 
in  Theorem  CINM  will  fail  as  the  first  n columns  of  M will  not  row-reduce  to 
the  identity  matrix.  However,  we  can  say  a bit  more.  When  A is  singular,  then  A 
does  not  have  an  inverse  (which  is  very  different  from  saying  that  the  procedure  in 
Theorem  CINM  fails  to  find  an  inverse).  This  may  feel  like  we  are  splitting  hairs, 
but  it  is  important  that  we  do  not  make  unfounded  assumptions.  These  observations 
motivate  the  next  theorem. 

Theorem  NI  Nonsingularity  is  Invertibility 

Suppose  that  A is  a square  matrix.  Then  A is  nonsingular  if  and  only  if  A is  invertible. 
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Proof.  (-<=)  Since  A is  invertible,  we  can  write  In  = AA_1  (Definition  MI).  Notice 
that  In  is  nonsingular  (Theorem  NMRRI)  so  Theorem  NPNT  implies  that  A (and 
A-1)  is  nonsingular. 

(=>)  Suppose  now  that  A is  nonsingular.  By  Theorem  CINM  we  find  B so  that 
AB  = In.  Then  Theorem  OSIS  tells  us  that  BA  = So  B is  A’s  inverse,  and  by 
construction,  A is  invertible.  ■ 

So  for  a square  matrix,  the  properties  of  having  an  inverse  and  of  having  a trivial 
null  space  are  one  and  the  same.  Cannot  have  one  without  the  other. 

Theorem  NME3  Nonsingular  Matrix  Equivalences,  Round  3 
Suppose  that  A is  a square  matrix  of  size  n.  The  following  are  equivalent. 

1.  A is  nonsingular. 

2.  A row-reduces  to  the  identity  matrix. 

3.  The  null  space  of  A contains  only  the  zero  vector,  A f(A)  = {0}. 

4.  The  linear  system  CS(A , b)  has  a unique  solution  for  every  possible  choice  of 

b. 

5.  The  columns  of  A are  a linearly  independent  set. 

6.  A is  invertible. 

Proof.  We  can  update  our  list  of  equivalences  for  nonsingular  matrices  (Theorem 
NME2)  with  the  equivalent  condition  from  Theorem  NI.  ■ 

In  the  case  that  A is  a nonsingular  coefficient  matrix  of  a system  of  equations, 
the  inverse  allows  us  to  very  quickly  compute  the  unique  solution,  for  any  vector  of 
constants. 

Theorem  SNCM  Solution  with  Nonsingular  Coefficient  Matrix 

Suppose  that  A is  nonsingular.  Then  the  unique  solution  to  CS(A,  b)  is  A-1b. 

Proof.  By  Theorem  NMUS  we  know  already  that  £<S(A,  b)  has  a unique  solution 
for  every  choice  of  b.  We  need  to  show  that  the  expression  stated  is  indeed  a solution 
(the  solution).  That  is  easy,  just  “plug  it  in”  to  the  vector  equation  representation 
of  the  system  (Theorem  SLEMM), 

A (A-1b)  = (AA-1)  b Theorem  MMA 

= In b Definition  MI 

= b Theorem  MMIM 

Since  Ax  = b is  true  when  we  substitute  A-1b  for  x,  A-1b  is  a (the!)  solution 
to  £S(A,  b).  ■ 
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Subsection  UM 
Unitary  Matrices 

Recall  that  the  adjoint  of  a matrix  is  A*  = ( A )*  (Definition  A). 

Definition  UM  Unitary  Matrices 

Suppose  that  U is  a square  matrix  of  size  n such  that  U*U  = In.  Then  we  say  U is 

unitary.  □ 

This  condition  may  seem  rather  far-fetched  at  first  glance.  Would  there  be  any 
matrix  that  behaved  this  way?  Well,  yes,  here  is  one. 

Example  UM3  Unitary  matrix  of  size  3 


ri-H 

3+2  i 

2+2  2 

VE. 

\/55 

V22 

1 — 2 

2+2  i 

-3+2 

\/5 

%/55 

U22 

2 

3—5  2 

2 

L %/5 

V55 

\/22  J 

The  computations  get  a bit  tiresome,  but  if  you  work  your  way  through  the  compu- 
tation of  U*U,  you  will  arrive  at  the  3x3  identity  matrix  I3.  A 


Unitary  matrices  do  not  have  to  look  quite  so  gruesome.  Here  is  a larger  one 
that  is  a bit  more  pleasing. 


Example  UPM  Unitary  permutation  matrix 
The  matrix 


P = 


'0  1 
0 0 
1 0 
0 0 
0 0 


0 0 
0 1 
0 0 
0 0 
1 0 


O' 

0 

0 

1 

0 


is  unitary  as  can  be  easily  checked.  Notice  that  it  is  just  a rearrangement  of  the 
columns  of  the  5x5  identity  matrix,  1 5 (Definition  IM). 

An  interesting  exercise  is  to  build  another  5x5  unitary  matrix,  R,  using  a 
different  rearrangement  of  the  columns  of  I5.  Then  form  the  product  PR.  This 
will  be  another  unitary  matrix  (Exercise  MINM.T10).  If  you  were  to  build  all 
51  = 5x4x3x2x1  = 120  matrices  of  this  type  you  would  have  a set  that 
remains  closed  under  matrix  multiplication.  It  is  an  example  of  another  algebraic 
structure  known  as  a group  since  together  the  set  and  the  one  operation  (matrix 
multiplication  here)  is  closed,  associative,  has  an  identity  (I5),  and  inverses  (Theorem 
UMI).  Notice  though  that  the  operation  in  this  group  is  not  commutative!  A 


If  a matrix  A has  only  real  number  entries  (we  say  it  is  a real  matrix)  then 
the  defining  property  of  being  unitary  simplifies  to  AtA  = In.  In  this  case  we,  and 
everybody  else,  call  the  matrix  orthogonal,  so  you  may  often  encounter  this  term 
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in  your  other  reading  when  the  complex  numbers  are  not  under  consideration. 

Unitary  matrices  have  easily  computed  inverses.  They  also  have  columns  that 
form  orthonormal  sets.  Here  are  the  theorems  that  show  us  that  unitary  matrices 
are  not  as  strange  as  they  might  initially  appear. 

Theorem  UMI  Unitary  Matrices  are  Invertible 

Suppose  that  U is  a unitary  matrix  of  size  n.  Then  U is  nonsingular,  and  t/-1  = U* . 

Proof.  By  Definition  UM,  we  know  that  U*U  = In.  The  matrix  In  is  nonsingular 
(since  it  row-reduces  easily  to  In,  Theorem  NMRRI).  So  by  Theorem  NPNT,  U and 
U*  are  both  nonsingular  matrices. 

The  equation  U*U  = In  gets  us  halfway  to  an  inverse  of  U,  and  Theorem  OSIS 
tells  us  that  then  UU*  = In  also.  So  U and  U*  are  inverses  of  each  other  (Definition 
MI).  ■ 

Theorem  CUMOS  Columns  of  Unitary  Matrices  are  Orthonormal  Sets 
Suppose  that  S = {Ar,  A2,  A3i  . . . , A„}  is  the  set  of  columns  of  a square  matrix  A 
of  size  n.  Then  A is  a unitary  matrix  if  and  only  if  S is  an  orthonormal  set. 


Proof.  The  proof  revolves  around  recognizing  that  a typical  entry  of  the  product 
A*  A is  an  inner  product  of  columns  of  A.  Here  are  the  details  to  support  this  claim. 


= 4*] 

k= 1 
n 

ik  kj 

Theorem  EMP 

WI e 

II 

\A\kj 
ik  J 

Theorem  EMP 

k= 1 
n 

ki 

Definition  TM 

— ^ \A\ki  [A]kj 

k= 1 
n 

Definition  CCM 

= E [A.1 

k= 1 

k 

— (A  i,  A j) 

Definition  IP 

We  now  employ  this  equality  in  a chain  of  equivalences, 
S = {Ai,  A2,  A3,  . . . , A„}  is  an  orthonormal  set 

0 if  i £ j 

1 if  i=j 


(Ai,  Aj)  = 


Definition  ONS 
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[A*A]tJ 


I 0 if  i^j 
\ 1 if  i=j 

[In]ij  , 1 <i<n,  1 < j <n 


A*  A = In 


A is  a unitary  matrix 


Definition  IM 
Definition  ME 
Definition  UM 


Example  OSMC  Orthonormal  set  from  matrix  columns 
The  matrix 


ri-H 

3+2  2 

2+22  "I 

s/5 

s/55 

sfri 

1-2 

2+2  2 

-3+2 

s/55 

s/22 

3-5  2 

2 

L V5 

s/55 

s/22  J 

from  Example  UM3  is  a unitary  matrix.  By  Theorem  CUMOS,  its  columns 


p-Nl 

[3+221 

[ 2+22  “I 

V5 
1 — 2 

s/55 
2+2  2 

s/22 

— 3+2 

2 

5 

VEE 

3-52 

? 

y/22 

2 

L s/55  J 

s/22  J 

form  an  orthonormal  set.  You  might  find  checking  the  six  inner  products  of  pairs 
of  these  vectors  easier  than  doing  the  matrix  product  U*U . Or,  because  the  inner 
product  is  anti-commutative  (Theorem  IPAC)  you  only  need  check  three  inner 
products  (see  Exercise  MINM.T12).  A 


When  using  vectors  and  matrices  that  only  have  real  number  entries,  orthogonal 
matrices  are  those  matrices  with  inverses  that  equal  their  transpose.  Similarly,  the 
inner  product  is  the  familiar  dot  product.  Keep  this  special  case  in  mind  as  you  read 
the  next  theorem. 


Theorem  UMPIP  Unitary  Matrices  Preserve  Inner  Products 

Suppose  that  U is  a unitary  matrix  of  size  n and  u and  v are  two  vectors  from  C™ . 


Then 

(C/u,  C/v)  = (u,  v) 

and 

l+v||  = ||v 

Proof. 

(Uu,  Uv)  = (Fli)Vv 

Theorem  MMIP 

= (C/u)‘[/v 

Theorem  MMCC 

= u+Vv 

Theorem  MMT 

= u fU*Uv 

Definition  A 

= u+v 

Definition  UM 
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Theorem  MMIM 
Theorem  MMIP 


The  second  conclusion  is  just  a specialization  of  the  first  conclusion. 


\\Uv\\ 


2 


V(Uv,  Uv) 


Theorem  IPN 


V (v,  v) 


2 


Theorem  IPN 


llvll 


Aside  from  the  inherent  interest  in  this  theorem,  it  makes  a bigger  statement 
about  unitary  matrices.  When  we  view  vectors  geometrically  as  directions  or  forces, 
then  the  norm  equates  to  a notion  of  length.  If  we  transform  a vector  by  multiplication 
with  a unitary  matrix,  then  the  length  (norm)  of  that  vector  stays  the  same.  If  we 
consider  column  vectors  with  two  or  three  slots  containing  only  real  numbers,  then 
the  inner  product  of  two  such  vectors  is  just  the  dot  product,  and  this  quantity  can 
be  used  to  compute  the  angle  between  two  vectors.  When  two  vectors  are  multiplied 
(transformed)  by  the  same  unitary  matrix,  their  dot  product  is  unchanged  and  their 
individual  lengths  are  unchanged.  This  results  in  the  angle  between  the  two  vectors 
remaining  unchanged. 

A “unitary  transformation”  (matrix-vector  products  with  unitary  matrices)  thus 
preserve  geometrical  relationships  among  vectors  representing  directions,  forces,  or 
other  physical  quantities.  In  the  case  of  a two-slot  vector  with  real  entries,  this  is 
simply  a rotation.  These  sorts  of  computations  are  exceedingly  important  in  computer 
graphics  such  as  games  and  real-time  simulations,  especially  when  increased  realism 
is  achieved  by  performing  many  such  computations  quickly.  We  will  see  unitary 
matrices  again  in  subsequent  sections  (especially  Theorem  OD)  and  in  each  instance, 
consider  the  interpretation  of  the  unitary  matrix  as  a sort  of  geometry-preserving 
transformation.  Some  authors  use  the  term  isometry  to  highlight  this  behavior.  We 
will  speak  loosely  of  a unitary  matrix  as  being  a sort  of  generalized  rotation. 

A final  reminder:  the  terms  “dot  product,”  “symmetric  matrix”  and  “orthogonal 
matrix”  used  in  reference  to  vectors  or  matrices  with  real  number  entries  are  special 
cases  of  the  terms  “inner  product,”  “Hermitian  matrix”  and  “unitary  matrix”  that 
we  use  for  vectors  or  matrices  with  complex  number  entries,  so  keep  that  in  mind  as 
you  read  elsewhere. 


Reading  Questions 
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1.  Compute  the  inverse  of  the  coefficient  matrix  of  the  system  of  equations  below  and  use 
the  inverse  to  solve  the  system. 

4*i  + 10*2  = 12 
2*i  + 6*2  = 4 


2.  In  the  reading  questions  for  Section  MISLE  you  were  asked  to  find  the  inverse  of  the 
3x3  matrix  below. 


2 3 1 

1 -2  -3 

-2  4 6 

Because  the  matrix  was  not  nonsingular,  you  had  no  theorems  at  that  point  that  would 
allow  you  to  compute  the  inverse.  Explain  why  you  now  know  that  the  inverse  does 
not  exist  (which  is  different  than  not  being  able  to  compute  it)  by  quoting  the  relevant 
theorem’s  acronym. 


3.  Is  the  matrix  A unitary?  Why? 


A = 


vf ;(4  + 2i ) 

pm  - *) 


+ ^ ' 

+k(12  + 14i) 


Exercises 


C20  Let  A = 


"l 

2 

l' 

'-1 

1 

o' 

0 

1 

1 

and  B = 

1 

2 

1 

1 

0 

2 

0 

1 

1 

. Verify  that  AB  is  nonsingular. 


C401  Solve  the  system  of  equations  below  using  the  inverse  of  a matrix. 

*i  + *2  + 3*3  + *4  = 5 
—2*i  — *2  — 4*3  — *4  = —7 
*i  + 4*2  + 10*3  + 2*4  = 9 
—2*i  — 4*3  + 5*4  = 9 


M10'  Find  values  of  *,  y,  z so  that  matrix  A = 


MU'  Find  values  of  x,  y z so  that  matrix  A = 


1 2 
3 0 
1 1 

1 * 

i y 

0 a 


is  invertible. 


is  singular. 


M15^  If  A and  B are  n x n matrices,  A is  nonsingular,  and  B is  singular,  show  directly 
that  AB  is  singular,  without  using  Theorem  NPNT. 

M20'  Construct  an  example  of  a 4 x 4 unitary  matrix. 

M80t  Matrix  multiplication  interacts  nicely  with  many  operations.  But  not  always  with 
transforming  a matrix  to  reduced  row-echelon  form.  Suppose  that  A is  an  m x n matrix 
and  B is  an  n x p matrix.  Let  P be  a matrix  that  is  row-equivalent  to  A and  in  reduced 
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row-echelon  form,  Q be  a matrix  that  is  row-equivalent  to  B and  in  reduced  row-echelon 
form,  and  let  R be  a matrix  that  is  row-equivalent  to  AB  and  in  reduced  row-echelon  form. 
Is  PQ  = R7  (In  other  words,  with  nonstandard  notation,  is  rref(A)rref(B)  = rref(AB)?) 

Construct  a counterexample  to  show  that,  in  general,  this  statement  is  false.  Then  find  a 
large  class  of  matrices  where  if  A and  B are  in  the  class,  then  the  statement  is  true. 

T10  Suppose  that  Q and  P are  unitary  matrices  of  size  n.  Prove  that  QP  is  a unitary 
matrix. 

Til  Prove  that  Hermitian  matrices  (Definition  HM)  have  real  entries  on  the  diagonal. 
More  precisely,  suppose  that  A is  a Hermitian  matrix  of  size  n.  Then  [A]u  £ R,  1 < i < n. 
T12  Suppose  that  we  are  checking  if  a square  matrix  of  size  n is  unitary.  Show  that 
a straightforward  application  of  Theorem  CUMOS  requires  the  computation  of  n1  2 inner 
products  when  the  matrix  is  unitary,  and  fewer  when  the  matrix  is  not  orthogonal.  Then 
show  that  this  maximum  number  of  inner  products  can  be  reduced  to  |n(n  + 1)  in  light  of 
Theorem  IPAC. 

T25  The  notation  Ak  means  a repeated  matrix  product  between  k copies  of  the  square 
matrix  A. 

1.  Assume  A is  an  n x n matrix  where  A2  = O (which  does  not  imply  that  A = O.) 
Prove  that  In  — A is  invertible  by  showing  that  In  + A is  an  inverse  of  I„  — A. 

2.  Assume  that  A is  an  n x n matrix  where  A3  = O.  Prove  that  In  — A is  invertible. 

3.  Form  a general  theorem  based  on  your  observations  from  parts  (1)  and  (2)  and 
provide  a proof. 


Section  CRS 

Column  and  Row  Spaces 


A matrix-vector  product  (Definition  MVP)  is  a linear  combination  of  the  columns 
of  the  matrix  and  this  allows  us  to  connect  matrix  multiplication  with  systems  of 
equations  via  Theorem  SLSLC.  Row  operations  are  linear  combinations  of  the  rows 
of  a matrix,  and  of  course,  reduced  row-echelon  form  (Definition  RREF)  is  also 
intimately  related  to  solving  systems  of  equations.  In  this  section  we  will  formalize 
these  ideas  with  two  key  definitions  of  sets  of  vectors  derived  from  a matrix. 

Subsection  CSSE 

Column  Spaces  and  Systems  of  Equations 

Theorem  SLSLC  showed  us  that  there  is  a natural  correspondence  between  solutions 
to  linear  systems  and  linear  combinations  of  the  columns  of  the  coefficient  matrix. 
This  idea  motivates  the  following  important  definition. 

Definition  CSM  Column  Space  of  a Matrix 

Suppose  that  A is  an  m x n matrix  with  columns  Ai,  A2,  A3,  ...,  An.  Then 
the  column  space  of  A,  written  C(A),  is  the  subset  of  Cm  containing  all  linear 
combinations  of  the  columns  of  A, 

C(A)  = ({A1,  A2,  A3,  ...,  An}) 

□ 

Some  authors  refer  to  the  column  space  of  a matrix  as  the  range,  but  we  will 
reserve  this  term  for  use  with  linear  transformations  (Definition  RLT). 

Upon  encountering  any  new  set,  the  first  question  we  ask  is  what  objects  are  in 
the  set,  and  which  objects  are  not?  Here  is  an  example  of  one  way  to  answer  this 
question,  and  it  will  motivate  a theorem  that  will  then  answer  the  question  precisely. 

Example  CSMCS  Column  space  of  a matrix  and  consistent  systems 
Archetype  D and  Archetype  E are  linear  systems  of  equations,  with  an  identical 
3x4  coefficient  matrix,  which  we  call  A here.  However,  Archetype  D is  consistent, 
while  Archetype  E is  not.  We  can  explain  this  difference  by  employing  the  column 
space  of  the  matrix  A. 

The  column  vector  of  constants,  b,  in  Archetype  D is  given  below,  and  one 
solution  listed  for  CS(A , b)  is  x, 


' 8 ' 

T 

-12 

X = 

8 

4 

1 

3 

217 


§CRS 


Beezer:  A First  Course  in  Linear  Algebra 


218 


By  Theorem  SLSLC,  we  can  summarize  this  solution  as  a linear  combination  of 
the  columns  of  A that  equals  b, 


- 2 ' 
-3 

+ 8 

T 

4 

+ 1 

‘ 7 ' 

-5 

+ 3 

-7 

-6 

' 8 ' 
-12 

1 

1 

4 

-5 

4 

This  equation  says  that  b is  a linear  combination  of  the  columns  of  A,  and  then 
by  Definition  CSM,  we  can  say  that  b £ C(A). 

On  the  other  hand,  Archetype  E is  the  linear  system  £S(A,  c),  where  the  vector 
of  constants  is 


c = 


'2' 

3 

2 


and  this  system  of  equations  is  inconsistent.  This  means  c ^ C(A),  for  if  it  were, 
then  it  would  equal  a linear  combination  of  the  columns  of  A and  Theorem  SLSLC 
would  lead  us  to  a solution  of  the  system  £S(A,  c).  A 


So  if  we  fix  the  coefficient  matrix,  and  vary  the  vector  of  constants,  we  can 
sometimes  find  consistent  systems,  and  sometimes  inconsistent  systems.  The  vectors 
of  constants  that  lead  to  consistent  systems  are  exactly  the  elements  of  the  column 
space.  This  is  the  content  of  the  next  theorem,  and  since  it  is  an  equivalence,  it 
provides  an  alternate  view  of  the  column  space. 


Theorem  CSCS  Column  Spaces  and  Consistent  Systems 

Suppose  A is  an  m x n matrix  and  b is  a vector  of  size  to.  Then  b £ C(A)  if  and 
only  if  CS (A,  b)  is  consistent. 


Proof.  (=>)  Suppose  b £ C(A).  Then  we  can  write  b as  some  linear  combination 
of  the  columns  of  A.  By  Theorem  SLSLC  we  can  use  the  scalars  from  this  linear 
combination  to  form  a solution  to  CS(A,  b),  so  this  system  is  consistent. 

(<=)  If  £S(A,  b)  is  consistent,  there  is  a solution  that  may  be  used  with  Theorem 
SLSLC  to  write  b as  a linear  combination  of  the  columns  of  A.  This  qualifies  b for 
membership  in  C(A).  ■ 


This  theorem  tells  us  that  asking  if  the  system  £S(A,  b)  is  consistent  is  exactly 
the  same  question  as  asking  if  b is  in  the  column  space  of  A.  Or  equivalently,  it  tells 
us  that  the  column  space  of  the  matrix  A is  precisely  those  vectors  of  constants,  b, 
that  can  be  paired  with  A to  create  a system  of  linear  equations  £S(A,  b)  that  is 
consistent. 

Employing  Theorem  SLEMM  we  can  form  the  chain  of  equivalences 
b £ C{A)  AS (A,  b)  is  consistent  •£=>  Ax  = b for  some  x 

Thus,  an  alternative  (and  popular)  definition  of  the  column  space  of  an  to  x n 
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matrix  A is 


C(A)  = {y  £ Cm|y  = Ax  for  some  x £ C"}  = {Ax|x  £ C"}  C Cm 

We  recognize  this  as  saying  create  all  the  matrix  vector  products  possible  with 
the  matrix  A by  letting  x range  over  all  of  the  possibilities.  By  Definition  MVP 
we  see  that  this  means  take  all  possible  linear  combinations  of  the  columns  of  A 
precisely  the  definition  of  the  column  space  (Definition  CSM)  we  have  chosen. 

Notice  how  this  formulation  of  the  column  space  looks  very  much  like  the  definition 
of  the  null  space  of  a matrix  (Definition  NSM),  but  for  a rectangular  matrix  the 
column  vectors  of  C(A)  and  Af(A)  have  different  sizes,  so  the  sets  are  very  different. 

Given  a vector  b and  a matrix  A it  is  now  very  mechanical  to  test  if  b £ C (A) . 
Form  the  linear  system  CS(A,  b),  row-reduce  the  augmented  matrix,  [A|  b],  and 
test  for  consistency  with  Theorem  RCLS.  Here  is  an  example  of  this  procedure. 


Example  MCSM  Membership  in  the  column  space  of  a matrix 
Consider  the  column  space  of  the  3x4  matrix  A, 


■ 3 

2 

1 

-4' 

A = 

-1 

1 

-2 

3 

ri8i 

2 

-4 

6 

-8 

We  first  show  that  v = 


-6 

12 


is  in  the  column  space  of  A,  v £ C(A).  Theorem 


CSCS  says  we  need  only  check  the  consistency  of  CS(A,  v).  Form  the  augmented 
matrix  and  row-reduce, 


' 3 
-1 

2 1 -4  18' 

1—23—6 

RREF 
> 

rm 

0 

0 1 

0 -1 

-2  6' 
1 0 

L 2 

-4  6 -8  12j 

_ 0 

0 0 

0 0. 

Since  the  final  column  is  not  a pivot  column,  Theorem  RCLS  tells  us  the  system 
is  consistent  and  therefore  by  Theorem  CSCS,  v £ C(A). 

If  we  wished  to  demonstrate  explicitly  that  v is  a linear  combination  of  the 
columns  of  A,  we  can  find  a solution  (any  solution)  of  CS(A,  v)  and  use  Theorem 
SLSLC  to  construct  the  desired  linear  combination.  For  example,  set  the  free  variables 
to  x$  = 2 and  '£4  = 1 . Then  a solution  has  X2  = 1 and  X\  = 6.  Then  by  Theorem 
SLSLC, 


"18‘ 

—6 

= 6 

' 3 ‘ 
-1 

+ 1 

■ 2 ‘ 
1 

+ 2 

■ 1 ' 

-2 

+ 1 

"— 4" 
3 

12 

2 
9 1 

-4 

6 

-8 

Now  we  show  that  w = 


1 

-3 


is  not  in  the  column  space  of  A,  w ^ C(A). 


Theorem  CSCS  says  we  need  only  check  the  consistency  of  CS(A,  w).  Form  the 
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augmented  matrix  and  row-reduce, 


[ 3 2 1 -4  2 ] 

rs 

0 1-20] 

-11-23  1 

RREF 
S> 

0 

0-iio 

L 2 -4  6 -8  -3j 

. 0 

0 0 0 0 

since  the  final  column  is  a pivot  column,  Theorem  RCLS  tells  us  the  system  is 
inconsistent  and  therefore  by  Theorem  CSCS,  w ^ C(A).  A 

Theorem  CSCS  completes  a collection  of  three  theorems,  and  one  definition, 
that  deserve  comment.  Many  questions  about  spans,  linear  independence,  null  space, 
column  spaces  and  similar  objects  can  be  converted  to  questions  about  systems 
of  equations  (homogeneous  or  not),  which  we  understand  well  from  our  previous 
results,  especially  those  in  Chapter  SLE.  These  previous  results  include  theorems 
like  Theorem  RCLS  which  allows  us  to  quickly  decide  consistency  of  a system,  and 
Theorem  BNS  which  allows  us  to  describe  solution  sets  for  homogeneous  systems 
compactly  as  the  span  of  a linearly  independent  set  of  column  vectors. 

The  table  below  lists  these  four  definitions  and  theorems  along  with  a brief 
reminder  of  the  statement  and  an  example  of  how  the  statement  is  used. 


Definition  NSM 

Synopsis 

Null  space  is  solution  set  of  homogeneous  system 

Example 

General  solution  sets  described  by  Theorem  PSPHS 

Theorem  SLSLC 

Synopsis 

Solutions  for  linear  combinations  with  unknown  scalars 

Example 

Deciding  membership  in  spans 

Theorem  SLEMM 

Synopsis 

System  of  equations  represented  by  matrix-vector  product 

Example 

Solution  to  CS(A,  b)  is  A_ib  when  A is  nonsingular 

Theorem  CSCS 

Synopsis 

Column  space  vectors  create  consistent  systems 

Example 

Deciding  membership  in  column  spaces 

Subsection  CSSOC 

Column  Space  Spanned  by  Original  Columns 

So  we  have  a foolproof,  automated  procedure  for  determining  membership  in  C(A). 
While  this  works  just  fine  a vector  at  a time,  we  would  like  to  have  a more  useful 
description  of  the  set  C(A)  as  a whole.  The  next  example  will  preview  the  first  of 
two  fundamental  results  about  the  column  space  of  a matrix. 

Example  CSTW  Column  space,  two  ways 
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Consider  the  5x7  mat 

rix  A, 
' 2 

4 

1 

-1 

1 

4 

4 ' 

1 

2 

1 

0 

2 

4 

7 

0 

0 

1 

4 

1 

8 

7 

1 

2 

-1 

2 

1 

9 

6 

-2 

-4 

1 

3 

-1 

-2 

-2 

According  to  the  definition  (Definition  CSM),  the  column  space  of  A is 


While  this  is  a concise  description  of  an  infinite  set,  we  might  be  able  to  describe 
the  span  with  fewer  than  seven  vectors.  This  is  the  substance  of  Theorem  BS.  So  we 
take  these  seven  vectors  and  make  them  the  columns  of  a matrix,  which  is  simply 
the  original  matrix  A again.  Now  we  row-reduce, 


'2  4 1-11 

12  10  2 

0 0 14  1 

1 2-12  1 

-2-4  1 3 -1 


4 

4 1 

rm 

2 

0 

0 

0 

3 

T 

4 

7 

0 

0 

0 

0 

0 

0 

-1 

0 

8 

7 

RREF 



0 

0 

0 

0 

2 

1 

9 

9 

6 

9 

0 

0 

0 

0 

0 

1 

3 

L 0 

0 

0 

0 

0 

0 

oj 

The  pivot  columns  are  D = {1,  3,  4,  5},  so  we  can  create  the  set 


r 

r 2 1 

r 1 1 

r-ii 

r 1 1 

) 

1 

1 

0 

2 

0 

1 

4 

1 

[ 

I 

1 

-1 

2 

1 

t 

-2 

1 

3 

-1 

> 

and  know  that  C{A)  = (T)  and  T is  a linearly  independent  set  of  columns  from  the 
set  of  columns  of  A.  A 


We  will  now  formalize  the  previous  example,  which  will  make  it  trivial  to  determine 
a linearly  independent  set  of  vectors  that  will  span  the  column  space  of  a matrix, 
and  is  constituted  of  just  columns  of  A. 

Theorem  BCS  Basis  of  the  Column  Space 

Suppose  that  A is  an  m x n matrix  with  columns  Ai,  A2,  A3,  . . . , An,  and  B is 
a row-equivalent  matrix  in  reduced  row-echelon  form  with  r pivot  columns.  Let 
D = {di,  c?2j  d.3,  . . . , dr}  be  the  set  of  indices  for  the  pivot  columns  of  B Let  T = 
{Adl,  Ad2,  Ad3,  Adr}.  Then 


1.  T is  a linearly  independent  set. 
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2.  C(A)  = (T). 

Proof.  Definition  CSM  describes  the  column  space  as  the  span  of  the  set  of  columns 
of  A.  Theorem  BS  tells  us  that  we  can  reduce  the  set  of  vectors  used  in  a span.  If 
we  apply  Theorem  BS  to  C(A),  we  would  collect  the  columns  of  A into  a matrix 
(which  would  just  be  A again)  and  bring  the  matrix  to  reduced  row-echelon  form, 
which  is  the  matrix  B in  the  statement  of  the  theorem.  In  this  case,  the  conclusions 
of  Theorem  BS  applied  to  A,  B and  C(A)  are  exactly  the  conclusions  we  desire.  ■ 

This  is  a nice  result  since  it  gives  us  a handful  of  vectors  that  describe  the  entire 
column  space  (through  the  span),  and  we  believe  this  set  is  as  small  as  possible 
because  we  cannot  create  any  more  relations  of  linear  dependence  to  trim  it  down 
further.  Furthermore,  we  defined  the  column  space  (Definition  CSM)  as  all  linear 
combinations  of  the  columns  of  the  matrix,  and  the  elements  of  the  set  T are  still 
columns  of  the  matrix  (we  will  not  be  so  lucky  in  the  next  two  constructions  of  the 
column  space). 

Procedurally  this  theorem  is  extremely  easy  to  apply.  Row-reduce  the  original 
matrix,  identify  r pivot  columns  the  reduced  matrix,  and  grab  the  columns  of  the 
original  matrix  with  the  same  indices  as  the  pivot  columns.  But  it  is  still  important 
to  study  the  proof  of  Theorem  BS  and  its  motivation  in  Example  COV  which  lie  at 
the  root  of  this  theorem.  We  will  trot  through  an  example  all  the  same. 

Example  CSOCD  Column  space,  original  columns,  Archetype  D 
Let  us  determine  a compact  expression  for  the  entire  column  space  of  the  coefficient 
matrix  of  the  system  of  equations  that  is  Archetype  D.  Notice  that  in  Example 
CSMCS  we  were  only  determining  if  individual  vectors  were  in  the  column  space  or 
not,  now  we  are  describing  the  entire  column  space. 

To  start  with  the  application  of  Theorem  BCS,  call  the  coefficient  matrix  A and 
row-reduce  it  to  reduced  row-echelon  form  B , 


A = 

'217-7 
-3  4 -5  -6 

B = 

H 

0 

0 

m 

3 -2 
1 -3 

. 1 1 4 -5. 

_ 0 

0 

o 

o 

Since  columns  1 and  2 are  pivot  columns,  D = {1,  2}.  To  construct  a set  that 
spans  C(A),  just  grab  the  columns  of  A with  indices  in  D , so 


That’s  it. 

In  Example  CSMCS  we  determined  that  the  vector 


-2" 

c = 3 
2 
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was  not  in  the  column  space  of  A.  Try  to  write  c as  a linear  combination  of  the  first 
two  columns  of  A.  What  happens? 

Also  in  Example  CSMCS  we  determined  that  the  vector 


b = 


r 8 i 


-12 


4 


was  in  the  column  space  of  A.  Try  to  write  b as  a linear  combination  of  the  first 
two  columns  of  A.  What  happens?  Did  you  find  a unique  solution  to  this  question? 
Hmmmm.  A 


Subsection  CSNM 

Column  Space  of  a Nonsingular  Matrix 

Let  us  specialize  to  square  matrices  and  contrast  the  column  spaces  of  the  coefficient 
matrices  in  Archetype  A and  Archetype  B. 

Example  CSAA  Column  space  of  Archetype  A 

The  coefficient  matrix  in  Archetype  A is  A,  which  row- reduces  to  B , 


T 

-1  2 

ra 

0 1 ' 

A = 

2 

1 1 

B = 

0 

0 

.1 

1 o. 

_ 0 

0 0 _ 

Columns  1 and  2 are  pivot  columns,  so  by  Theorem  BCS  we  can  write 


C(A)  = ({A1;  A2}> 


--r 

i 

i 


We  want  to  show  in  this  example  that  C{A)  ^ C3.  So  take,  for  example,  the 

rll 

3 . Then  there  is  no  solution  to  the  system  CS{A,  b),  or  equivalently, 
2 


vector  b = 


it  is  not  possible  to  write  b as  a linear  combination  of  Ai  and  A2.  Try  one  of  these 
two  computations  yourself.  (Or  try  both!).  Since  b ^ C(A),  the  column  space  of  A 
cannot  be  all  of  C3.  So  by  varying  the  vector  of  constants,  it  is  possible  to  create 
inconsistent  systems  of  equations  with  this  coefficient  matrix  (the  vector  b being 
one  such  example). 

In  Example  MWIAA  we  wished  to  show  that  the  coefficient  matrix  from  Arche- 
type A was  not  invertible  as  a first  example  of  a matrix  without  an  inverse.  Our 
device  there  was  to  find  an  inconsistent  linear  system  with  A as  the  coefficient 
matrix.  The  vector  of  constants  in  that  example  was  b,  deliberately  chosen  outside 
the  column  space  of  A.  A 
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Example  CSAB  Column  space  of  Archetype  B 

The  coefficient  matrix  in  Archetype  B,  call  it  B here,  is  known  to  be  nonsingular 
(see  Example  NM).  By  Theorem  NMUS,  the  linear  system  CS{B , b)  has  a (unique) 
solution  for  every  choice  of  b.  Theorem  CSCS  then  says  that  b £ C(B)  for  all  b £ C3. 
Stated  differently,  there  is  no  way  to  build  an  inconsistent  system  with  the  coefficient 
matrix  B , but  then  we  knew  that  already  from  Theorem  NMUS.  A 

Example  CSAA  and  Example  CSAB  together  motivate  the  following  equivalence, 
which  says  that  nonsingular  matrices  have  column  spaces  that  are  as  big  as  possible. 

Theorem  CSNM  Column  Space  of  a Nonsingular  Matrix 

Suppose  A is  a square  matrix  of  size  n.  Then  A is  nonsingular  if  and  only  if 
C{A)  = C". 

Proof.  (=>)  Suppose  A is  nonsingular.  We  wish  to  establish  the  set  equality  C(A)  = 
Cn.  By  Definition  CSM,  C(A)  C Cra.  To  show  that  Cn  C C(A)  choose  b £ C".  By 
Theorem  NMUS,  we  know  the  linear  system  £S(A,  b)  has  a (unique)  solution  and 
therefore  is  consistent.  Theorem  CSCS  then  says  that  b £ C(A).  So  by  Definition 
SE,  C(A)  = Cn. 

(<£=)  If  e,;  is  column  i of  the  n x n identity  matrix  (Definition  SUV)  and  by 
hypothesis  C(A)  = Cn,  then  e,  £ C(A)  for  1 < i < n.  By  Theorem  CSCS,  the  system 
£S(A,  e,)  is  consistent  for  1 <i  < n.  Let  bj  denote  any  one  particular  solution  to 
£S(A,  e,;),  1 < * < n. 

Define  the  n x n matrix  B = [bi |b2 |b3 1 . . . |b„].  Then 
AB  = A [b1|b2|b3| . . . |b„] 

= [Abi |v4.b2 1 Ab3|  . . . |Ab„]  Definition  MM 

= [ei|e2|e3| . . . |en] 

= In  Definition  SUV 

So  the  matrix  B is  a “right-inverse”  for  A.  By  Theorem  NMRRI,  In  is  a non- 
singular matrix,  so  by  Theorem  NPNT  both  A and  B are  nonsingular.  Thus,  in 
particular,  A is  nonsingular.  (Travis  Osborne  contributed  to  this  proof.)  ■ 

With  this  equivalence  for  nonsingular  matrices  we  can  update  our  list,  Theorem 
NME3. 

Theorem  NME4  Nonsingular  Matrix  Equivalences,  Round  4 
Suppose  that  A is  a square  matrix  of  size  n.  The  following  are  equivalent. 

1.  A is  nonsingular. 

2.  A row-reduces  to  the  identity  matrix. 

3.  The  null  space  of  A contains  only  the  zero  vector,  A f(A)  = {0}. 
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4-  The  linear  system  CS(A , b)  has  a unique  solution  for  every  possible  choice  of 

b. 

5.  The  columns  of  A are  a linearly  independent  set. 

6.  A is  invertible. 

1.  The  column  space  of  A is  C",  C(A)  = Cn. 

Proof.  Since  Theorem  CSNM  is  an  equivalence,  we  can  add  it  to  the  list  in  Theorem 
NME3.  ■ 

Subsection  RSM 
Row  Space  of  a Matrix 

The  rows  of  a matrix  can  be  viewed  as  vectors,  since  they  are  just  lists  of  numbers, 
arranged  horizontally.  So  we  will  transpose  a matrix,  turning  rows  into  columns,  so 
we  can  then  manipulate  rows  as  column  vectors.  As  a result  we  will  be  able  to  make 
some  new  connections  between  row  operations  and  solutions  to  systems  of  equations. 
OK,  here  is  the  second  primary  definition  of  this  section. 

Definition  RSM  Row  Space  of  a Matrix 

Suppose  A is  an  m x n matrix.  Then  the  row  space  of  A,  71(A),  is  the  column 
space  of  A *,  i.e.  71(A)  = C(At).  □ 

Informally,  the  row  space  is  the  set  of  all  linear  combinations  of  the  rows  of 
A.  However,  we  write  the  rows  as  column  vectors,  thus  the  necessity  of  using  the 
transpose  to  make  the  rows  into  columns.  Additionally,  with  the  row  space  defined 
in  terms  of  the  column  space,  all  of  the  previous  results  of  this  section  can  be  applied 
to  row  spaces. 

Notice  that  if  A is  a rectangular  m x n matrix,  then  C(A)  C Cm,  while  7 Z(A)  C Cra 
and  the  two  sets  are  not  comparable  since  they  do  not  even  hold  objects  of  the  same 
type.  However,  when  A is  square  of  size  n,  both  C(A)  and  71(A)  are  subsets  of  C", 
though  usually  the  sets  will  not  be  equal  (but  see  Exercise  CRS.M20). 

Example  RSAI  Row  space  of  Archetype  I 
The  coefficient  matrix  in  Archetype  I is 


■ 1 

4 

0 

-1 

0 

7 

-9' 

2 

8 

-1 

3 

9 

-13 

7 

0 

0 

2 

—3 

—4 

12 

-8 

-1 

-4 

2 

4 

8 

-31 

37 
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To  build  the  row  space,  we  transpose  the  matrix, 


I*  = 


1 

2 

0 

-1' 

4 

8 

0 

-4 

0 

-1 

2 

2 

-1 

3 

-3 

4 

0 

9 

-4 

8 

7 

-13 

12 

-31 

—9 

7 

-8 

37 

rix 

are  used 

in 

a span 

to  build  the  row  space 

TZ{I)=C(lt)  = (< 


1 

2 

0 

-1 

4 

8 

0 

-4 

0 

-1 

2 

2 

-1 

3 

-3 

4 

0 

9 

-4 

8 

7 

-13 

12 

-31 

< 

-9 

7 

-8 

37 

> 

However,  we  can  use  Theorem  BCS  to  get  a slightly  better  description.  First, 
row- reduce  /*, 


'0  0 0 -f 

0 0 0 f 
0 0 0 f 

0 0 0 0 
0 0 0 0 
0 0 0 0 
0 0 0 0 


Since  the  pivot  columns  have  indices  D = {1,  2,  3},  the  column  space  of  P can 
be  spanned  by  just  the  first  three  columns  of  /*, 


K(I)=C(lt)  = ( < 


' 1 ' 

2 

' 0 ' 

4 

8 

0 

0 

-1 

2 

-1 

? 

3 

5 

-3 

0 

9 

-4 

7 

-13 

12 

-9 

7 

-8 

y 

A 


The  row  space  would  not  be  too  interesting  if  it  was  simply  the  column  space  of 
the  transpose.  However,  when  we  do  row  operations  on  a matrix  we  have  no  effect 
on  the  many  linear  combinations  that  can  be  formed  with  the  rows  of  the  matrix. 
This  is  stated  more  carefully  in  the  following  theorem. 

Theorem  REMRS  Row-Equivalent  Matrices  have  equal  Row  Spaces 
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Suppose  A and  B are  row- equivalent  matrices.  Then  'R.(A)  = 1Z(B). 

Proof.  Two  matrices  are  row-equivalent  (Definition  REM)  if  one  can  be  obtained 
from  another  by  a sequence  of  (possibly  many)  row  operations.  We  will  prove  the 
theorem  for  two  matrices  that  differ  by  a single  row  operation,  and  then  this  result 
can  be  applied  repeatedly  to  get  the  full  statement  of  the  theorem.  The  row  spaces 
of  A and  B are  spans  of  the  columns  of  their  transposes.  For  each  row  operation 
we  perform  on  a matrix,  we  can  define  an  analogous  operation  on  the  columns. 
Perhaps  we  should  call  these  column  operations.  Instead,  we  will  still  call  them 
row  operations,  but  we  will  apply  them  to  the  columns  of  the  transposes. 

Refer  to  the  columns  of  A*  and  B 4 as  A,  and  Bt,  1 < i < m.  The  row  operation 
that  switches  rows  will  just  switch  columns  of  the  transposed  matrices.  This  will 
have  no  effect  on  the  possible  linear  combinations  formed  by  the  columns. 

Suppose  that  Bf  is  formed  from  A*  by  multiplying  column  At  by  a ^ 0.  In  other 
words,  B/  = qA(,  and  B;  = A,  for  all  i ^ t.  We  need  to  establish  that  two  sets 
are  equal,  C{At)  = C(Bt).  We  will  take  a generic  element  of  one  and  show  that  it  is 
contained  in  the  other. 

/31B1+/I2B2  + /33B3  + • • • + fitBt  + • • • + /3mB  m 

= PlA\  + P2A-2  + /S3A3  + •••+/? t (01  At)  + • • • + fim Am 
= fiiAi  + P2A-2  + ^3 A3  + • • • + (a/3 1)  A*  + • • • + fimAm 
says  that  C(I?t)  C C(At).  Similarly, 

71A1+72A2  + 73A3  -) b 7 1 At  -I b 7mATO 

= 7iai  + 72A2  + 73A3  H b a :J  At  H b 7 'mAm 

= 71  Aj  + 72A2  + 73A3  + •••  + — (aAt)  + • • • + 7mAm 

a 

= 71B1  + 72B2  -b  73B3  -I b — Bt  -| b 7mBm 

a 

says  that  C(At)  C C(I?t).  So  TZ(A)  = C(At)  = C(Bt ) = 7 Z(B)  when  a single  row 
operation  of  the  second  type  is  performed. 

Suppose  now  that  Bt  is  formed  from  A ; by  replacing  At  with  aAs  + A t for  some 
a £ C and  s 7^  t.  In  other  words,  B(  = aAs  + At,  and  B,  = A*  for  i ^ t. 

p1B1+p2B2  + • • • + /3SBS  + • • • + /3fB  t + • • • + PmBm 

= Pi-Ai  + P2A2  + • • • + PSAS  + ■ ■ ■ + fit  (qAs  + At)  + • • • + fimAm 
= fiiAi  + P2A2  + • • • + fis  As  + • • • + ( fit a)  As  + fit  At  + • ■ ■ + fimAm 
= Pi Ai  + fi2A2  + • • • + fis As  + ( fita ) As  + • • • + fit At  + • ■ ■ + fimAm 
= fiiAi  + P2A2  + • • • + ( fis  + fita)  As  + • • • + fitAt  + ■ ■ ■ + fimAm 
says  that  C(Rt)  C C(At).  Similarly, 
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71A1  + 72A2  H b 7sAs  H 1-  "ft  At  H b 7mAm 

= 71A1  + 72A2  b 7s As  H b (-«7t  As  + crytAs)  + 7t  At  H b 7mA m 

= 7i^i  + 72A2  + • • • + (—ajt  + 7s)  + • • • + 7 1 (aAs  + At)  + • • • + 7mAm 

= 71B1  + 72B2  H b (-a^t  + 7s)  Bs  H b 7tBt  H b lmBm 

says  that  C(At)  C C(B4).  So  TZ(A)  = C(At ) = C(B4)  = 1Z(B)  when  a single  row 
operation  of  the  third  type  is  performed. 

So  the  row  space  of  a matrix  is  preserved  by  each  row  operation,  and  hence  row 
spaces  of  row-equivalent  matrices  are  equal  sets.  ■ 


Example  RSREM  Row  spaces  of  two  row-equivalent  matrices 
In  Example  TREM  we  saw  that  the  matrices 


~2 

-1 

3 

4- 

1 

1 

0 

6 ' 

A = 

5 

2 

-2 

3 

B = 

3 

0 

-2 

-9 

1 

1 

0 

6 

2 

-1 

3 

4 

are  row-equivalent  by  demonstrating  a sequence  of  two  row  operations  that  converted 
A into  B.  Applying  Theorem  REMRS  we  can  say 


n(A)  = 


2 

-1 

5 

2 

1 

1 

3 

5 

-2 

0 

4 

3 

6 

T" 

1 

0 ’ 
6 


= K(B) 


Theorem  REMRS  is  at  its  best  when  one  of  the  row-equivalent  matrices  is  in 
reduced  row-echelon  form.  The  vectors  that  are  zero  rows  can  be  ignored.  (Who 
needs  the  zero  vector  when  building  a span?  See  Exercise  LI.T10.)  The  echelon 
pattern  insures  that  the  nonzero  rows  yield  vectors  that  are  linearly  independent. 
Here  is  the  theorem. 


Theorem  BRS  Basis  for  the  Row  Space 

Suppose  that  A is  a matrix  and  B is  a row- equivalent  matrix  in  reduced  row-echelon 
form.  Let  S be  the  set  of  nonzero  columns  of  Bl . Then 


1.  K(A)  = ( 5 ). 

2.  S is  a linearly  independent  set. 


Proof.  From  Theorem  REMRS  we  know  that  'R.(A)  = 1Z(B).  If  B has  any  zero  rows, 
these  are  columns  of  B * that  are  the  zero  vector.  We  can  safely  toss  out  the  zero 
vector  in  the  span  construction,  since  it  can  be  recreated  from  the  nonzero  vectors 
by  a linear  combination  where  all  the  scalars  are  zero.  So  TZ(A)  = (S'). 

Suppose  B has  r nonzero  rows  and  let  D = {e^,  g?2,  d^,  . . . , dr}  denote  the 
indices  of  the  pivot  columns  of  B.  Denote  the  r column  vectors  of  R4,  the  vectors 
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in  S , as  Bi,  B2,  B3,  . . . , Br.  To  show  that  S is  linearly  independent,  start  with  a 
relation  of  linear  dependence 

orBi  + a 2B2  + 03B3  + • • • + arBr  = 0 


Now  consider  this  vector  equality  in  location  di.  Since  B is  in  reduced 
form,  the  entries  of  column  di  of  B are  all  zero,  except  for  a leading  1 in 
in  i?4,  row  di  is  all  zeros,  excepting  a 1 in  column  i.  So,  for  1 < * < r, 


0 = [0]di 

= [«iBi 


Q2B2  + CI3B3 


QrB 


r]  di 


= [aiBi]d.  + [a2B2]d.  + [0363]^.  + • • 
= ai  [Bi]d.  + a2  [B2]d.  + a3  [B3]dj  + ■ 

= ai(0)  + a2(0)  + a3(0)  H 1-  «*(!) 


[arB 


ridi 


[B 


rid. 


av(0) 


Definition 

Definition 

Definition 

Definition 

Definition 


row-echelon 
row  i.  Thus, 


zcv 

RLDCV 

MA 

MSM 

RREF 


— 0^2 


So  we  conclude  that  ai  = 0 for  all  1 < i < r,  establishing  the  linear  independence 
of  S (Definition  LICV).  ■ 


Example  IAS  Improving  a span 

Suppose  in  the  course  of  analyzing  a matrix  (its  column  space,  its  null  space,  its 
. . . ) we  encounter  the  following  set  of  vectors,  described  by  a span 


Let  A be  the  matrix  whose  rows  are  the  vectors  in  X, 


so  by  design  X = 1Z(A), 


A = 


' 1 
3 
1 

-3 


2 

-1 

-1 

2 


1 6 
2 -1 
0 -1 
-3  6 


6 ‘ 

6 

-2 

-10 


Row-reduce  A to  form  a row-equivalent  matrix  in  reduced  row-echelon  form, 

U]  0 0 2 -1' 

B=  0 0 » 3 1 

0 0 0-25 

0 0 0 0 0 
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Then  Theorem  BRS  says  we  can  grab  the  nonzero  columns  of  Bl  and  write 


X = 11(A)  = 11(B) 


These  three  vectors  provide  a much-improved  description  of  X.  There  are  fewer 
vectors,  and  the  pattern  of  zeros  and  ones  in  the  first  three  entries  makes  it  easier  to 
determine  membership  in  X.  A 


Notice  that  in  Example  IAS  all  we  had  to  do  was  row-reduce  the  right  matrix  and 
toss  out  a zero  row.  Next  to  row  operations  themselves,  Theorem  BRS  is  probably 
the  most  powerful  computational  technique  at  your  disposal  as  it  quickly  provides  a 
much  improved  description  of  a span,  any  span  (row  space,  column  space,  . . . ). 

Theorem  BRS  and  the  techniques  of  Example  IAS  will  provide  yet  another 
description  of  the  column  space  of  a matrix.  First  we  state  a triviality  as  a theorem, 
so  we  can  reference  it  later. 


Theorem  CSRST  Column  Space,  Row  Space,  Transpose 
Suppose  A is  a matrix.  Then  C(A)  = ft  (A4). 


Proof. 

C(A)=c({At)t ) 

= ft  (A*) 


Theorem  TT 
Definition  RSM 


So  to  find  another  expression  for  the  column  space  of  a matrix,  build  its  transpose, 
row-reduce  it,  toss  out  the  zero  rows,  and  convert  the  nonzero  rows  to  column  vectors 
to  yield  an  improved  set  for  the  span  construction.  We  will  do  Archetype  I,  then 
you  do  Archetype  J. 


Example  CSROI  Column  space  from  row  operations,  Archetype  I 

To  find  the  column  space  of  the  coefficient  matrix  of  Archetype  I,  we  proceed  as 

follows.  The  matrix  is 


■ 1 

4 

0 

-1 

0 

7 

-9' 

2 

8 

-1 

3 

9 

— 13 

7 

0 

0 

2 

—3 

-4 

12 

-8 

-1 

-4 

2 

4 

8 

-31 

37 
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The  transpose  is 


Row-reduced  this  becomes, 


1 

2 

0 

-1' 

4 

8 

0 

-4 

0 

-1 

2 

2 

-1 

3 

-3 

4 

0 

9 

-4 

8 

7 

— 13 

12 

-31 

-9 

7 

-8 

37 

0 

0 

31" 

7 

0 

0 

0 

12 

7 

0 

0 

0 

13 

7 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Now,  using  Theorem  CSRST  and  Theorem  BRS 


C(I)  = H(lt)  = 


1 ' 
0 
0 

_ 31 
7 ■ 


"0" 

"0" 

1 

0 

0 

5 

1 

12 

13 

L 7 J 

L 7 J 

This  is  a very  nice  description  of  the  column  space.  Fewer  vectors  than  the  7 
involved  in  the  definition,  and  the  pattern  of  the  zeros  and  ones  in  the  first  3 slots 
can  be  used  to  advantage.  For  example,  Archetype  I is  presented  as  a consistent 
system  of  equations  with  a vector  of  constants 

T31 


b = 


Since  £S(I,  b)  is  consistent,  Theorem  CSCS  tells  us  that  b £ C (I).  But  we  could 
see  this  quickly  with  the  following  computation,  which  really  only  involves  any  work 
in  the  4th  entry  of  the  vectors  as  the  scalars  in  the  linear  combination  are  dictated 
by  the  first  three  entries  of  b. 


b = 


Can  you  now  rapidly  construct  several  vectors,  b,  so  that  CS(I,  b)  is  consistent, 
and  several  more  so  that  the  system  is  inconsistent?  A 


"3" 

" 1 " 

"0" 

"0" 

9 

0 

+ 9 

1 

+ 1 

0 

1 

= 3 

0 

0 

1 

4 

31 

12 

13 

L 7 J 

L 7 J 

L 7 -1 
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Reading  Questions 


1.  Write  the  column  space  of  the  matrix  below  as  the  span  of  a set  of  three  vectors  and 
explain  your  choice  of  method. 

' 1 3 13' 

2 0 11 

-12  10 


2.  Suppose  that  A is  an  n x n nonsingular  matrix.  What  can  you  say  about  its  column 
space? 


3.  Is  the  vector 


’O' 

5 

2 

3 


in  the  row  space  of  the  following  matrix?  Why  or  why  not? 


13  13 

2 0 11 
-12  10 


Exercises 

C20  For  each  matrix  below,  find  a set  of  linearly  independent  vectors  X so  that  (A'} 
equals  the  column  space  of  the  matrix,  and  a set  of  linearly  independent  vectors  Y so  that 
(Y)  equals  the  row  space  of  the  matrix. 


A = 


A 

0 

1 

1 


2 

1 

-1 

1 


3 

1 

2 

2 


1 ' 

2 

3 

-1 


2 111 
2-145 
1112 


C = 


0 ‘ 
3 

-3 

-1 

-1 


From  your  results  for  these  three  matrices,  can  you  formulate  a conjecture  about  the  sets 
X and  Y? 

C30'  Example  CSOCD  expresses  the  column  space  of  the  coefficient  matrix  from  Arche- 
type D (call  the  matrix  A here)  as  the  span  of  the  first  two  columns  of  A.  In  Example 
CSMCS  we  determined  that  the  vector 


2 

3 

2 


was  not  in  the  column  space  of  A and  that  the  vector 


b = 


8 


-12 


4 


was  in  the  column  space  of  A.  Attempt  to  write  c and  b as  linear  combinations  of  the  two 
vectors  in  the  span  construction  for  the  column  space  in  Example  CSOCD  and  record  your 
observations. 
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C31'  For  the  matrix  A below  find  a set  of  vectors  T meeting  the  following  requirements: 
(1)  the  span  of  T is  the  column  space  of  A,  that  is,  (T)  = C(A),  (2)  T is  linearly  independent, 
and  (3)  the  elements  of  T are  columns  of  A. 


A = 


C32  In  Example  CSAA,  verify 
coefficient  matrix. 


' 2 

1 

4 

-1 

2" 

1 

-1 

5 

1 

1 

-1 

2 

-7 

0 

1 

2 

-1 

8 

-1 

2 

;hat  the 

vector 

b is 

not  in  the  coiumn  space  of  the 

C33^  Find  a linearly  independent  set  S so  that  the  span  of  S,  {S},  is  row  space  of  the 
matrix  B,  and  S is  linearly  independent. 


B = 


2 

1 

-1 


3 1 1 

1 0 1 

2 3-4 


C34f  For  the  3 x 4 matrix  A and  the  column  vector  y € C4  given  below,  determine  if  y 
is  in  the  row  space  of  A.  In  other  words,  answer  the  question:  y € 7 Z(A)? 


A = 

'-2 

7 

6 

-3 

7 

0 

-l' 

-3 

y = 

' 2 ' 
1 
q 

8 

0 

7 

6 

o 

-2 

C35^  For  the  matrix  A below,  find  two  different  linearly  independent  sets  whose  spans 
equal  the  column  space  of  A,  C(A ),  such  that 

1.  the  elements  are  each  columns  of  A. 

2.  the  set  is  obtained  by  a procedure  that  is  substantially  different  from  the  procedure 
you  use  in  part  (1). 


A = 


3 

1 

-3 


5 

2 

-4 


1 -2 
3 3 

7 13 


C40  The  following  archetypes  are  systems  of  equations.  For  each  system,  write  the  vector 
of  constants  as  a linear  combination  of  the  vectors  in  the  span  construction  for  the  column 
space  provided  by  Theorem  BCS  (these  vectors  are  listed  for  each  of  these  archetypes). 


Archetype  A,  Archetype  B,  Archetype  C,  Archetype  D,  Archetype  E,  Archetype  F,  Archetype 
G,  Archetype  H,  Archetype  I,  Archetype  J 

C42  The  following  archetypes  are  either  matrices  or  systems  of  equations  with  coefficient 
matrices.  For  each  matrix,  compute  a set  of  coiumn  vectors  such  that  (1)  the  vectors  are 
coiumns  of  the  matrix,  (2)  the  set  is  lineariy  independent,  and  (3)  the  span  of  the  set  is 
the  coiumn  space  of  the  matrix.  See  Theorem  BCS. 
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Archetype  A,  Archetype  B,  Archetype  C,  Archetype  D/Archetype  E,  Archetype  F,  Archetype 
G/Archetype  H,  Archetype  I,  Archetype  J,  Archetype  K,  Archetype  L 
C50  The  following  archetypes  are  either  matrices  or  systems  of  equations  with  coefficient 
matrices.  For  each  matrix,  compute  a set  of  column  vectors  such  that  (1)  the  set  is  linearly 
independent,  and  (2)  the  span  of  the  set  is  the  row  space  of  the  matrix.  See  Theorem  BRS. 

Archetype  A,  Archetype  B,  Archetype  C,  Archetype  D/Archetype  E,  Archetype  F,  Archetype 
G/Archetype  H,  Archetype  I,  Archetype  J,  Archetype  K,  Archetype  L 
C51  The  following  archetypes  are  either  matrices  or  systems  of  equations  with  coefficient 
matrices.  For  each  matrix,  compute  the  column  space  as  the  span  of  a linearly  independent 
set  as  follows:  transpose  the  matrix,  row-reduce,  toss  out  zero  rows,  convert  rows  into 
column  vectors.  See  Example  CSROI. 

Archetype  A,  Archetype  B,  Archetype  C,  Archetype  D/Archetype  E,  Archetype  F,  Archetype 
G/Archetype  H,  Archetype  1,  Archetype  J,  Archetype  K,  Archetype  L 
C52  The  following  archetypes  are  systems  of  equations.  For  each  different  coefficient 
matrix  build  two  new  vectors  of  constants.  The  first  should  lead  to  a consistent  system 
and  the  second  should  lead  to  an  inconsistent  system.  Descriptions  of  the  column  space  as 
spans  of  linearly  independent  sets  of  vectors  with  “nice  patterns”  of  zeros  and  ones  might 
be  most  useful  and  instructive  in  connection  with  this  exercise.  (See  the  end  of  Example 
CSROI.) 


Archetype  A,  Archetype  B,  Archetype  C,  Archetype  D/Archetype  E,  Archetype  F,  Archetype 
G/Archetype  H,  Archetype  I,  Archetype  J 


MICL  For  the  matrix  E below,  find  vectors  b and  c so  that  the  system  CS(E,  b)  is 
consistent  and  JZS(E,  c)  is  inconsistent. 


E = 


-2 

3 

4 


1 1 0 

-10  2 
1 1 6 


M20'  Usually  the  column  space  and  null  space  of  a matrix  contain  vectors  of  different 
sizes.  For  a square  matrix,  though,  the  vectors  in  these  two  sets  are  the  same  size.  Usually 
the  two  sets  will  be  different.  Construct  an  example  of  a square  matrix  where  the  column 
space  and  null  space  are  equal. 

M21'  We  have  a variety  of  theorems  about  how  to  create  column  spaces  and  row  spaces 
and  they  frequently  involve  row-reducing  a matrix.  Here  is  a procedure  that  some  try  to 
use  to  get  a column  space.  Begin  with  an  m x n matrix  A and  row-reduce  to  a matrix  B 
with  columns  Bi,  B2,  B3,  ....  B„.  Then  form  the  column  space  of  A as 

C(A)  = ({Br,  B2,  B3,  . . . , B„j)  = C(B) 

This  is  not  not  a legitimate  procedure,  and  therefore  is  not  a theorem.  Construct  an  example 
to  show  that  the  procedure  will  not  in  general  create  the  column  space  of  A. 

T4(F  Suppose  that  A is  an  m x n matrix  and  B is  an  n x p matrix.  Prove  that  the 
column  space  of  AB  is  a subset  of  the  column  space  of  A,  that  is  C(AB)  C C(A).  Provide  an 
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example  where  the  opposite  is  false,  in  other  words  give  an  example  where  C(A ) 2 C(AB). 
(Compare  with  Exercise  MM.T40.) 

T41 ' Suppose  that  A is  an  m x n matrix  and  B is  an  n x n nonsingular  matrix.  Prove 
that  the  column  space  of  A is  equal  to  the  column  space  of  AB , that  is  C(A)  = C(AB). 
(Compare  with  Exercise  MM.T41  and  Exercise  CRS.T40.) 

T45^  Suppose  that  A is  an  m x n matrix  and  B is  an  n x m matrix  where  AB  is  a 
nonsingular  matrix.  Prove  that 

1.  A r(B)  = {0} 

2.  C(B)  n Af{A)  = {0} 

Discuss  the  case  when  m = n in  connection  with  Theorem  NPNT. 


Section  FS 
Four  Subsets 


There  are  four  natural  subsets  associated  with  a matrix.  We  have  met  three  already: 
the  null  space,  the  column  space  and  the  row  space.  In  this  section  we  will  introduce 
a fourth,  the  left  null  space.  The  objective  of  this  section  is  to  describe  one  procedure 
that  will  allow  us  to  find  linearly  independent  sets  that  span  each  of  these  four  sets 
of  column  vectors.  Along  the  way,  we  will  make  a connection  with  the  inverse  of 
a matrix,  so  Theorem  FS  will  tie  together  most  all  of  this  chapter  (and  the  entire 
course  so  far). 

Subsection  LNS 
Left  Null  Space 

Definition  LNS  Left  Null  Space 

Suppose  A is  an  m x n matrix.  Then  the  left  null  space  is  defined  as  C(A)  = 
Af{At)  C Cm.  □ 

The  left  null  space  will  not  feature  prominently  in  the  sequel,  but  we  can  explain 
its  name  and  connect  it  to  row  operations.  Suppose  y £ C{A).  Then  by  Definition 
LNS,  Aty  = 0.  We  can  then  write 

0*  = (A4y)*  Definition  LNS 

= y*  (A4) * Theorem  MMT 

= y t A Theorem  TT 

The  product  y4A  can  be  viewed  as  the  components  of  y acting  as  the  scalars  in 
a linear  combination  of  the  rows  of  A.  And  the  result  is  a “row  vector” , 04  that  is 
totally  zeros.  When  we  apply  a sequence  of  row  operations  to  a matrix,  each  row  of 
the  resulting  matrix  is  some  linear  combination  of  the  rows.  These  observations  tell 
us  that  the  vectors  in  the  left  null  space  are  scalars  that  record  a sequence  of  row 
operations  that  result  in  a row  of  zeros  in  the  row-reduced  version  of  the  matrix. 
We  will  see  this  idea  more  explicitly  in  the  course  of  proving  Theorem  FS. 

Example  LNS  Left  null  space 
We  will  find  the  left  null  space  of 

A 


1 

-2 

1 

9 


—3 

1 

5 


—4  0 
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We  transpose  A and  row-reduce, 


r 1 -21  9 1 

CM 

O 

O 

E 

A*  = 

—3  1 5—4 

RREF 
S> 

0 It’  0 -3 

L 1 1 1 0 J 

. 0 0 [T]  1 _ 

Applying  Definition  LNS  and  Theorem  BNS  we  have 


£(A)  = Af(Af) 


2" 

3 

-1 

1 


If  you  row-reduce  A you  will  discover  one  zero  row  in  the  reduced  row-echelon 
form.  This  zero  row  is  created  by  a sequence  of  row  operations,  which  in  total  amounts 
to  a linear  combination,  with  scalars  ai  = —2,  a 2 = 3,  03  = — 1 and  04  = 1,  on  the 
rows  of  A and  which  results  in  the  zero  vector  (check  this!).  So  the  components  of 
the  vector  describing  the  left  null  space  of  A provide  a relation  of  linear  dependence 
on  the  rows  of  A.  A 


Subsection  CCS 
Computing  Column  Spaces 

We  have  three  ways  to  build  the  column  space  of  a matrix.  First,  we  can  use  just  the 
definition,  Definition  CSM,  and  express  the  column  space  as  a span  of  the  columns  of 
the  matrix.  A second  approach  gives  us  the  column  space  as  the  span  of  some  of  the 
columns  of  the  matrix,  and  additionally,  this  set  is  linearly  independent  (Theorem 
BCS).  Finally,  we  can  transpose  the  matrix,  row-reduce  the  transpose,  kick  out 
zero  rows,  and  write  the  remaining  rows  as  column  vectors.  Theorem  CSRST  and 
Theorem  BRS  tell  us  that  the  resulting  vectors  are  linearly  independent  and  their 
span  is  the  column  space  of  the  original  matrix. 

We  will  now  demonstrate  a fourth  method  by  way  of  a rather  complicated  example. 
Study  this  example  carefully,  but  realize  that  its  main  purpose  is  to  motivate  a 
theorem  that  simplifies  much  of  the  apparent  complexity.  So  other  than  an  instructive 
exercise  or  two,  the  procedure  we  are  about  to  describe  will  not  be  a usual  approach 
to  computing  a column  space. 

Example  CSANS  Column  space  as  null  space 


A = 


of  the 

matrix  A 

below 

with 

' 10 

0 

3 

8 

7 

— 16 

-1 

-4 

-10 

-13 

-6 

1 

-3 

-6 

-6 

0 

2 

-2 

-3 

-2 

3 

0 

1 

2 

3 

-1 

-1 

1 

1 

0 
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By  Theorem  CSCS  we  know  that  the  column  vector  b is  in  the  column  space 
of  A if  and  only  if  the  linear  system  CS {A,  b)  is  consistent.  So  let  us  try  to  solve 
this  system  in  full  generality,  using  a vector  of  variables  for  the  vector  of  constants. 
In  other  words,  which  vectors  b lead  to  consistent  systems?  Begin  by  forming  the 
augmented  matrix  [A  | b]  with  a general  version  of  b, 


' 10 

0 

3 

8 

7 

61' 

-16 

-1 

-4 

-10 

-13 

62 

-6 

1 

-3 

-6 

-6 

^3 

0 

2 

-2 

-3 

-2 

&4 

3 

0 

1 

2 

3 

-1 

-1 

1 

1 

0 

h 

To  identify  solutions  we  will  bring  this  matrix  to  reduced  row-echelon  form. 
Despite  the  presence  of  variables  in  the  last  column,  there  is  nothing  to  stop  us 
from  doing  this,  except  numerical  computational  routines  cannot  be  used,  and  even 
some  of  the  symbolic  algebra  routines  do  some  unexpected  maneuvers  with  this 
computation.  So  do  it  by  hand.  Yes,  it  is  a bit  of  work.  But  worth  it.  We’ll  still  be 
here  when  you  get  back.  Notice  along  the  way  that  the  row  operations  are  exactly 
the  same  ones  you  would  do  if  you  were  just  row-reducing  the  coefficient  matrix 
alone,  say  in  connection  with  a homogeneous  system  of  equations.  The  column  with 
the  hi  acts  as  a sort  of  bookkeeping  device.  There  are  many  different  possibilities  for 
the  result,  depending  on  what  order  you  choose  to  perform  the  row  operations,  but 
shortly  we  will  all  be  on  the  same  page.  If  you  want  to  match  our  work  right  now, 
use  row  5 to  remove  any  occurrence  of  b\  from  the  other  entries  of  the  last  column, 
and  use  row  6 to  remove  any  occurrence  of  62  from  the  last  columns.  We  have: 


'0  0 0 0 2 

0 0 0 0-3 

0 0 0 0 1 

0 0 0 0 -2 

0 0 0 0 0 

0 0 0 0 0 


&3  — &4  + 265  — 

—263  + 364  — 365  + 36g 
63  + 64  + 365  + 36g 

—263  + 64  — 46.5 
64  + 363  — 64  + 365  + 65 
62  — 263  + 64  + 65  — 6g 


Our  goal  is  to  identify  those  vectors  b which  make  CS(A , b)  consistent.  By 
Theorem  RCLS  we  know  that  the  consistent  systems  are  precisely  those  without  a 
pivot  column  in  the  last  column.  Are  the  expressions  in  the  last  column  of  rows  5 
and  6 equal  to  zero,  or  are  they  leading  l’s?  The  answer  is:  maybe.  It  depends  on  b. 
With  a nonzero  value  for  either  of  these  expressions,  we  would  scale  the  row  and 
produce  a leading  1.  So  we  get  a consistent  system,  and  b is  in  the  column  space, 
if  and  only  if  these  two  expressions  are  both  simultaneously  zero.  In  other  words, 
members  of  the  column  space  of  A are  exactly  those  vectors  b that  satisfy 


b\  + 363  — 64  + 365  + 6g  — 0 
62  — 263  + 64  + 65  — 6g  = 0 
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Hmmm.  Looks  suspiciously  like  a homogeneous  system  of  two  equations  with 
six  variables.  If  you  have  been  playing  along  (and  we  hope  you  have)  then  you  may 
have  a slightly  different  system,  but  you  should  have  just  two  equations.  Form  the 
coefficient  matrix  and  row-reduce  (notice  that  the  system  above  has  a coefficient 
matrix  that  is  already  in  reduced  row-echelon  form).  We  should  all  be  together  now 
with  the  same  matrix, 


L = 


[0 

0 


0 

0 


3 

-2 


-1 

1 


3 1 

1 -1 


So,  C(A)  = A f{L)  and  we  can  apply  Theorem  BNS  to  obtain  a linearly  independent 
set  to  use  in  a span  construction, 


C(A)=N(L) 


-3 

1 

—3 

-1 

2 

-1 

-1 

1 

1 

0 

0 

0 

0 

7 

1 

5 

0 

7 

0 

0 

0 

1 

0 

0 

0 

0 

1 

> 

Whew!  As  a postscript  to  this  central  example,  you  may  wish  to  convince  yourself 
that  the  four  vectors  above  really  are  elements  of  the  column  space.  Do  they  create 
consistent  systems  with  A as  coefficient  matrix?  Can  you  recognize  the  constant 
vector  in  your  description  of  these  solution  sets? 

OK,  that  was  so  much  fun,  let  us  do  it  again.  But  simpler  this  time.  And  we  will 
all  get  the  same  results  all  the  way  through.  Doing  row  operations  by  hand  with 
variables  can  be  a bit  error  prone,  so  let  us  see  if  we  can  improve  the  process  some. 
Rather  than  row-reduce  a column  vector  b full  of  variables,  let  us  write  b = Iq b 
and  we  will  row-reduce  the  matrix  Iq  and  when  we  finish  row-reducing,  then  we  will 
compute  the  matrix- vector  product.  You  should  first  convince  yourself  that  we  can 
operate  like  this  (this  is  the  subject  of  a future  homework  exercise). 

Rather  than  augmenting  A with  b,  we  will  instead  augment  it  with  Iq  (does  this 
feel  familiar?), 


10 

0 

3 

8 

7 

1 

0 

0 

0 

0 

o' 

-16 

-1 

-4 

-10 

-13 

0 

1 

0 

0 

0 

0 

-6 

1 

-3 

-6 

-6 

0 

0 

1 

0 

0 

0 

0 

2 

-2 

-3 

-2 

0 

0 

0 

1 

0 

0 

3 

0 

1 

2 

3 

0 

0 

0 

0 

1 

0 

-1 

-1 

1 

1 

0 

0 

0 

0 

0 

0 

1 

We  want  to  row-reduce  the  left-hand  side  of  this  matrix,  but  we  will  apply  the 
same  row  operations  to  the  right-hand  side  as  well.  And  once  we  get  the  left-hand 
side  in  reduced  row-echelon  form,  we  will  continue  on  to  put  leading  l’s  in  the  final 
two  rows,  as  well  as  making  pivot  columns  that  contain  these  two  additional  leading 
l’s.  It  is  these  additional  row  operations  that  will  ensure  that  we  all  get  to  the  same 
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place,  since  the  reduced  row-echelon  form  is  unique  (Theorem  RREFU), 


T 

0 

0 

0 

2 

0 

0 

1 

-1 

2 

-1 

0 

1 

0 

0 

—3 

0 

0 

-2 

3 

—3 

3 

0 

0 

1 

0 

1 

0 

0 

1 

1 

3 

3 

0 

0 

0 

1 

-2 

0 

0 

—2 

1 

—4 

0 

0 

0 

0 

0 

0 

1 

0 

3 

-1 

3 

1 

0 

0 

0 

0 

0 

0 

1 

—2 

1 

1 

-1 

We  are  after  the  final  six  columns  of  this  matrix,  which  we  will  multiply  by  b 


'0  0 

1 

2 

-T 

0 0 

-2 

3 

3 3 

0 0 

1 

1 

3 

3 

J 

0 0 

-2 

1 

1 0 

1 0 

3 

-1 

3 

1 

_0  1 

-2 

1 

1 

-1 

'0 

0 

1 

1 

2 

-1' 

'61' 

^3 

— 64  + 265  — &6 

0 

0 

-2 

3 

-3 

3 

^2 

— 263 

+ 364  — 365  + 3^6 

0 

0 

1 

1 

3 

3 

^3 

63  64  + 365  + 3b6 

0 

0 

—2 

1 

-4 

0 

b4 

263  + 64  — 465 

1 

0 

3 

-1 

3 

1 

^5 

b\  + 3&3  — 64  + 365  + b§ 

0 

1 

—2 

1 

1 

-1 

p6_ 

_ &2  — 263  + 64  + 65  — _ 

So  by  applying  the  same  row  operations  that  row-reduce  A to  the  identity  matrix 
(which  we  could  do  computationally  once  Iq  is  placed  alongside  of  A),  we  can  then 
arrive  at  the  result  of  row-reducing  a column  of  symbols  where  the  vector  of  constants 
usually  resides.  Since  the  row-reduced  version  of  A has  two  zero  rows,  for  a consistent 
system  we  require  that 

bi  + 363  — 64  + 365  + b(j  = 0 

62  — 263  + 64  + 65  — &6  = 0 

Now  we  are  exactly  back  where  we  were  on  the  first  go-round.  Notice  that  we 
obtain  the  matrix  L as  simply  the  last  two  rows  and  last  six  columns  of  N.  A 

This  example  motivates  the  remainder  of  this  section,  so  it  is  worth  careful  study. 
You  might  attempt  to  mimic  the  second  approach  with  the  coefficient  matrices  of 
Archetype  I and  Archetype  J.  We  will  see  shortly  that  the  matrix  L contains  more 
information  about  A than  just  the  column  space. 

Subsection  EEF 
Extended  Echelon  Form 


The  final  matrix  that  we  row-reduced  in  Example  CSANS  should  look  familiar  in 
most  respects  to  the  procedure  we  used  to  compute  the  inverse  of  a nonsingular 
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matrix,  Theorem  CINM.  We  will  now  generalize  that  procedure  to  matrices  that  are 
not  necessarily  nonsingular,  or  even  square.  First  a definition. 

Definition  EEF  Extended  Echelon  Form 

Suppose  A is  an  m x n matrix.  Extend  A on  its  right  side  with  the  addition  of  an 
to  x to  identity  matrix  to  form  an  to  x (n+m)  matrix  M . Use  row  operations  to  bring 
M to  reduced  row-echelon  form  and  call  the  result  N . N is  the  extended  reduced 
row-echelon  form  of  A,  and  we  will  standardize  on  names  for  five  submatrices  ( B , 
C,  J,  K,  L)  of  N. 

Let  B denote  the  to  x n matrix  formed  from  the  first  n columns  of  N and  let  J 
denote  the  to  x to  matrix  formed  from  the  last  to  columns  of  N . Suppose  that  B has 
r nonzero  rows.  Further  partition  N by  letting  C denote  the  r x n matrix  formed 
from  all  of  the  nonzero  rows  of  B.  Let  K be  the  r x to  matrix  formed  from  the  first 
r rows  of  J,  while  L will  be  the  (to  — r)  x to  matrix  formed  from  the  bottom  m — r 
rows  of  J.  Pictorially, 


M = [A\Im]  N = [B\J\  = 


Example  SEEF  Submatrices  of  extended  echelon  form 
We  illustrate  Definition  EEF  with  the  matrix  A , 


c 

K 

0 

L 

A = 


' 1 
-6 


M = 


-6  2 
4 -1 

3 -1 


and  row-reducing,  we  obtain 

rH  jl 

N = 


So  we  then  obtain 


-1  -2  7 

1 

6 

2 —4  —18 

-3 

-26 

-1  4 10 

2 

17 

-12  9 

1 

12 

lentity  matrix, 

-2  7 1 

6 

1 

0 

0 

O' 

CO 

1 

00 
T— 1 

-26 

0 

1 

0 

0 

4 10  2 

17 

0 

0 

1 

0 

2 9 1 

12 

0 

0 

0 

1 

1 0 3 

0 

1 

1 

1 ' 

-6  0 -1 

0 

2 

3 

0 

o a 2 

0 

-1 

0 

-2 

0 0 0 

0 

2 

2 

1 . 

~ 0 2 1 

0 

3 ' 

a 4 -6 

0 

-1 

0 0 0 

0 

2 

0 0 0 

0 

0 

□ 


B = 
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C 


J 


K 

L 


[0 

0 

2 

1 

0 

0 

4 

— 

6 

. 0 

0 

0 

0 

- 0 

1 

1 

1 

0 

2 

3 

0 

0 

-1 

0 

— 

2 

m 

2 

2 

1 

'0 

1 

1 

1 ■ 

0 

2 

3 

0 

0 

-1 

0 

-2 

0 2 2!] 


0 

0 

0 


3 ' 
-1 
2 


You  can  observe  (or  verify)  the  properties  of  the  following  theorem  with  this 
example.  A 


Theorem  PEEF  Properties  of  Extended  Echelon  Form 

Suppose  that  A is  an  m x n matrix  and  that  N is  its  extended  echelon  form.  Then 


1.  J is  nonsingular. 

2.  B = J A. 

3.  If  x £ Cn  and  y £ Cm,  then  Ax  = y if  and  only  if  Bx  = Jy. 

4-  C is  in  reduced  row-echelon  form,  has  no  zero  rows  and  has  r pivot  columns. 
5.  L is  in  reduced  row-echelon  form,  has  no  zero  rows  and  has  m — r pivot  columns. 


Proof.  J is  the  result  of  applying  a sequence  of  row  operations  to  Im , and  therefore 
J and  Jm  are  row-equivalent.  CS(Im , 0)  has  only  the  zero  solution,  since  Im  is 
nonsingular  (Theorem  NMRRI).  Thus,  CS(J,  0)  also  has  only  the  zero  solution 
(Theorem  REMES,  Definition  ESYS)  and  J is  therefore  nonsingular  (Definition 
NSM). 

To  prove  the  second  part  of  this  conclusion,  first  convince  yourself  that  row 
operations  and  the  matrix-vector  product  are  associative  operations.  By  this  we 
mean  the  following.  Suppose  that  F is  an  m x n matrix  that  is  row-equivalent  to 
the  matrix  G.  Apply  to  the  column  vector  F w the  same  sequence  of  row  operations 
that  converts  F to  G.  Then  the  result  is  Gw.  So  we  can  do  row  operations  on  the 
matrix,  then  do  a matrix-vector  product,  or  do  a matrix-vector  product  and  then  do 
row  operations  on  a column  vector,  and  the  result  will  be  the  same  either  way.  Since 
matrix  multiplication  is  defined  by  a collection  of  matrix-vector  products  (Definition 
MM),  the  matrix  product  FH  will  become  GH  if  we  apply  the  same  sequence  of 
row  operations  to  FH  that  convert  F to  G.  (This  argument  can  be  made  more 
rigorous  using  elementary  matrices  from  the  upcoming  Subsection  DM. EM  and  the 
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associative  property  of  matrix  multiplication  established  in  Theorem  MMA.)  Now 
apply  these  observations  to  A. 

Write  AIn  = ImA  and  apply  the  row  operations  that  convert  M to  N.  A is 
converted  to  B , while  Im  is  converted  to  J,  so  we  have  BIn  = J A.  Simplifying  the 
left  side  gives  the  desired  conclusion. 

For  the  third  conclusion,  we  now  establish  the  two  equivalences 

Ax  = y •<==>•  JAx  = Jy  Bx  = Jy 

The  forward  direction  of  the  first  equivalence  is  accomplished  by  multiplying 
both  sides  of  the  matrix  equality  by  J,  while  the  backward  direction  is  accomplished 
by  multiplying  by  the  inverse  of  J (which  we  know  exists  by  Theorem  NI  since  J is 
nonsingular).  The  second  equivalence  is  obtained  simply  by  the  substitutions  given 
by  J A = B. 

The  first  r rows  of  N are  in  reduced  row-echelon  form,  since  any  contiguous 
collection  of  rows  taken  from  a matrix  in  reduced  row-echelon  form  will  form  a 
matrix  that  is  again  in  reduced  row-echelon  form  (Exercise  RREF.T12).  Since  the 
matrix  C is  formed  by  removing  the  last  n entries  of  each  these  rows,  the  remainder 
is  still  in  reduced  row-echelon  form.  By  its  construction,  C has  no  zero  rows.  C has 
r rows  and  each  contains  a leading  1,  so  there  are  r pivot  columns  in  C. 

The  final  m — r rows  of  N are  in  reduced  row-echelon  form,  since  any  contiguous 
collection  of  rows  taken  from  a matrix  in  reduced  row-echelon  form  will  form  a 
matrix  that  is  again  in  reduced  row-echelon  form.  Since  the  matrix  L is  formed  by 
removing  the  first  n entries  of  each  these  rows,  and  these  entries  are  all  zero  (they 
form  the  zero  rows  of  B),  the  remainder  is  still  in  reduced  row-echelon  form.  L is  the 
final  m — r rows  of  the  nonsingular  matrix  J,  so  none  of  these  rows  can  be  totally 
zero,  or  J would  not  row-reduce  to  the  identity  matrix.  L has  m — r rows  and  each 
contains  a leading  1,  so  there  are  m — r pivot  columns  in  L.  ■ 

Notice  that  in  the  case  where  A is  a nonsingular  matrix  we  know  that  the  reduced 
row-echelon  form  of  A is  the  identity  matrix  (Theorem  NMRRI),  so  B = In.  Then 
the  second  conclusion  above  says  J A = B = so  J is  the  inverse  of  A.  Thus  this 
theorem  generalizes  Theorem  CINM,  though  the  result  is  a “left-inverse”  of  A rather 
than  a “right-inverse.” 

The  third  conclusion  of  Theorem  PEEF  is  the  most  telling.  It  says  that  x is  a 
solution  to  the  linear  system  CS{A1  y)  if  and  only  if  x is  a solution  to  the  linear 
system  CS(B,  Jy).  Or  said  differently,  if  we  row-reduce  the  augmented  matrix  [ A | y] 
we  will  get  the  augmented  matrix  [ B | Jy] . The  matrix  J tracks  the  cumulative  effect 
of  the  row  operations  that  converts  A to  reduced  row-echelon  form,  here  effectively 
applying  them  to  the  vector  of  constants  in  a system  of  equations  having  A as  a 
coefficient  matrix.  When  A row-reduces  to  a matrix  with  zero  rows,  then  Jy  should 
also  have  zero  entries  in  the  same  rows  if  the  system  is  to  be  consistent. 
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Subsection  FS 
Four  Subsets 


With  all  the  preliminaries  in  place  we  can  state  our  main  result  for  this  section.  In 
essence  this  result  will  allow  us  to  say  that  we  can  find  linearly  independent  sets  to 
use  in  span  constructions  for  all  four  subsets  (null  space,  column  space,  row  space, 
left  null  space)  by  analyzing  only  the  extended  echelon  form  of  the  matrix,  and 
specifically,  just  the  two  submatrices  C and  L,  which  will  be  ripe  for  analysis  since 
they  are  already  in  reduced  row-echelon  form  (Theorem  PEEF). 

Theorem  FS  Four  Subsets 

Suppose  A is  an  m x n matrix  with  extended  echelon  form  N . Suppose  the  reduced 
row-echelon  form  of  A has  r nonzero  rows.  Then  C is  the  submatrix  of  N formed 
from  the  first  r rows  and  the  first  n columns  and  L is  the  submatrix  of  N formed 
from  the  last  m columns  and  the  last  m — r rows.  Then 

1.  The  null  space  of  A is  the  null  space  of  C , M{A)  =Af(C). 

2.  The  row  space  of  A is  the  row  space  of  C , 7 Z(A)  = 1Z(C). 

3.  The  column  space  of  A is  the  null  space  of  L,  C(A)  = M(L). 
f.  The  left  null  space  of  A is  the  row  space  of  L,  C(A)  = 1Z(L). 


Proof.  First,  A f(A)  = N(B)  since  B is  row-equivalent  to  A (Theorem  REMES).  The 
zero  rows  of  B represent  equations  that  are  always  true  in  the  homogeneous  system 
CS(B:  0),  so  the  removal  of  these  equations  will  not  change  the  solution  set.  Thus, 
in  turn,  AT \B)  =Af(C). 

Second,  7vL(A)  = 7 Z(B)  since  B is  row-equivalent  to  A (Theorem  REMRS).  The 
zero  rows  of  B contribute  nothing  to  the  span  that  is  the  row  space  of  B,  so  the 
removal  of  these  rows  will  not  change  the  row  space.  Thus,  in  turn,  1Z(B)  = 1Z(C). 

Third,  we  prove  the  set  equality  C(A ) = A f{L)  with  Definition  SE.  Begin  by 
showing  that  C(A)  C J\f(L).  Choose  y £ C(A)  C Cm.  Then  there  exists  a vector 
x £ Cn  such  that  Ax  = y (Theorem  CSCS).  Then  for  1 < k < m — r, 


[Ly  }k  = I Jy]r+k 

= [Bx\r+k 
= [°X]fc 
= [0]fc 


L a submatrix  of  J 
Theorem  PEEF 
Zero  matrix  a submatrix  of  B 
Theorem  MMZM 


So,  for  all  1 < k <m  — r,  [Ly]k  = [0]fe.  So  by  Definition  CVE  we  have  Ly  = 0 
and  thus  y € A f{L). 

Now,  show  that  A f(L)  C C(A).  Choose  y £ A f(L)  C Cm.  Form  the  vector 
Ky  £ Cr.  The  linear  system  CS(C , Ky)  is  consistent  since  C is  in  reduced  row- 
echelon  form  and  has  no  zero  rows  (Theorem  PEEF).  Let  x £ C"  denote  a solution 
to  CS(C,  Ky). 
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Then  for  1 < j < r, 

[Bx]j  = [Cx\j 
= \Ky]j 

= [Jy], 

And  for  r + 1 < k < m, 
[Bx]k  = [Ox\k_r 
= [0]*_r 
= [Ly  ]fc-r 
= iJy)k 


C a submatrix  of  B 
x a solution  to  CS(C , K y) 
K a submatrix  of  J 

Zero  matrix  a submatrix  of  B 
Theorem  MMZM 
y in  A f{L) 

L a submatrix  of  J 


So  for  all  1 < i < m,  [Ax]  - = [Jy];  and  by  Definition  CVE  we  have  Ax  = J y. 
From  Theorem  PEEF  we  know  then  that  Ax  = y,  and  therefore  y £ C(A)  (Theorem 
CSCS).  By  Definition  SE  we  now  have  C(A)  = AT(L). 

Fourth,  we  prove  the  set  equality  C(A)  = 1Z(L)  with  Definition  SE.  Begin  by 
showing  that  1Z(L)  C C(A).  Choose  y £ Tl(L)  C Cm.  Then  there  exists  a vector 
w £ Cm-r  such  that  y = A*w  (Definition  RSM,  Theorem  CSCS).  Then  for  1 < i < n, 


m 


k= 1 

Theorem  EMP 

m 
k= 1 

Definition  of  w 

m m—r 

Theorem  EMP 

k= 1 t=\ 

m m—r 

= EE14  MkK 

Property  DCN 

fc=l  ^=1 
m—r  m 

= EEM,.[‘*]uw 

Property  CACN 

£=1  fe  = l 

m—r  / m \ 

= E eiapiJh 

<=1  \fc=l  / 

Property  DCN 

m—r  / m \ 

A EI4 h« 

e=i  \k= i / 


L a submatrix  of  J 
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= E lA‘J‘  k,«M. 
£=1 
m—r 

= e [<w 


£= 1 
m—r 


i,r-\-£ 


= E 


<=i 


Theorem  EMP 
Theorem  MMT 
Theorem  PEEF 


= E°i 


= 0 
= [0], 


Zero  rows  in  B 

Property  ZCN 
Definition  ZCV 


Since  [A4y];;  = [0]  ■ for  1 < i < n,  Definition  CVE  implies  that  Aty  = 0.  This 
means  that  y £ A/’(At). 

Now,  show  that  C{A)  C 1Z(L).  Choose  y £ £(A)  C Cm.  The  matrix  J is 
nonsingular  (Theorem  PEEF),  so  J4  is  also  nonsingular  (Theorem  MIT)  and  therefore 
the  linear  system  £<S(J4,  y)  has  a unique  solution.  Denote  this  solution  as  x £ Cm. 
We  will  need  to  work  with  two  “halves”  of  x,  which  we  will  denote  as  z and  w with 
formal  definitions  given  by 


lAj  = Ni  1 <j<r,  [w]k  = [x]r+k  1 <k<m-r 

Now,  for  1 < j < r, 


P'li  = E Plid-u 


k=l 


= E[c’]*izL  + Ei°]i<w 


k= 1 


£=1 

m—r 


r+l  lWl< 


k= 1 


£=1 

m—r 


r+£  LXJ  r+£ 


k—1 


AM.M.+  E [B‘ 


it  M* 


k—1 


k=r-\~l 


= EIB‘ 


fe= i 


jk  Wfc 


Theorem  EMP 
Definition  ZM 
C,  O submatrices  of  B 
Definitions  of  z and  w 
Re-index  second  sum 


Combine  sums 
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m 


= E M*  W, 

fe= i 

Theorem  PEEF 

= E M* 

k=l 

Theorem  MMT 

m m 

= EE  KURIL  M. 

fc=i  ^=i 

Theorem  EMP 

m 7TL 

£=1  fc=l 

Property  CACN 

m / m \ 

= E My  E [-'’]«  M») 

<=i  \fe=i  / 

Property  DCN 

=EW„  [J'4 
£=1 

Theorem  EMP 

m 

= EM«tt 

f—-\ 

Definition  of  x 

II 

E J 

Theorem  EMP 

= [0]j 

y e C(A) 

So,  by  Definition  CVE,  C4z  = 0 and  the  vector  z gives  us  a linear  combination 
of  the  columns  of  C 4 that  equals  the  zero  vector.  In  other  words,  z gives  a relation 
of  linear  dependence  on  the  the  rows  of  C.  However,  the  rows  of  C are  a linearly 
independent  set  by  Theorem  BRS.  According  to  Definition  LICV  we  must  conclude 
that  the  entries  of  z are  all  zero,  i.e.  z = 0. 

Now,  for  1 < i < m,  we  have 

II 

Definition  of  x 

m 

=erl  w* 

Theorem  EMP 

k= 1 


r m 

= E [J*]ifc[x]fc  Break  apart  sum 

/c— 1 fe=r+l 

r m 

-Eb'E  w*+  E [w]fc_r  Definition  of  z and  w 

A:— 1 fc=r+l 

r m—r 

= E 0 + E 

fc=l  ^=1 


z = 0,  re-index 
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= °+  Z)  !>*]<,/ h 


1=1 

wl 


L a submatrix  of  J 
Theorem  EMP 


So  by  Definition  CVE,  y = L*w.  The  existence  of  w implies  that  y £ 7?.(L),  and 
therefore  C(A)  C 1Z(L).  So  by  Definition  SE  we  have  C{A)  = 1Z(L).  ■ 

The  first  two  conclusions  of  this  theorem  are  nearly  trivial.  But  they  set  up  a 
pattern  of  results  for  C that  is  reflected  in  the  latter  two  conclusions  about  L.  In 
total,  they  tell  us  that  we  can  compute  all  four  subsets  just  by  finding  null  spaces  and 
row  spaces.  This  theorem  does  not  tell  us  exactly  how  to  compute  these  subsets,  but 
instead  simply  expresses  them  as  null  spaces  and  row  spaces  of  matrices  in  reduced 
row-echelon  form  without  any  zero  rows  (C  and  L).  A linearly  independent  set  that 
spans  the  null  space  of  a matrix  in  reduced  row-echelon  form  can  be  found  easily 
with  Theorem  BNS.  It  is  an  even  easier  matter  to  find  a linearly  independent  set 
that  spans  the  row  space  of  a matrix  in  reduced  row-echelon  form  with  Theorem 
BRS,  especially  when  there  are  no  zero  rows  present.  So  an  application  of  Theorem 
FS  is  typically  followed  by  two  applications  each  of  Theorem  BNS  and  Theorem 
BRS. 

The  situation  when  r = m deserves  comment,  since  now  the  matrix  L has  no 
rows.  What  is  C(A)  when  we  try  to  apply  Theorem  FS  and  encounter  One 

interpretation  of  this  situation  is  that  L is  the  coefficient  matrix  of  a homogeneous 
system  that  has  no  equations.  How  hard  is  it  to  find  a solution  vector  to  this  system? 
Some  thought  will  convince  you  that  any  proposed  vector  will  qualify  as  a solution, 
since  it  makes  all  of  the  equations  true.  So  every  possible  vector  is  in  the  null  space 
of  L and  therefore  C(A)  = N{L)  = Cm.  OK,  perhaps  this  sounds  like  some  twisted 
argument  from  Alice  in  Wonderland.  Let  us  try  another  argument  that  might  solidly 
convince  you  of  this  logic. 

If  r = m,  when  we  row-reduce  the  augmented  matrix  of  CS(A,  b)  the  result 
will  have  no  zero  rows,  and  the  first  n columns  will  all  be  pivot  columns,  leaving 
none  for  the  final  column,  so  by  Theorem  RCLS  the  system  will  be  consistent.  By 
Theorem  CSCS,  b £ C(A).  Since  b was  arbitrary,  every  possible  vector  is  in  the 
column  space  of  A , so  we  again  have  C(A)  = Cm.  The  situation  when  a matrix  has 
r = m is  known  by  the  term  full  rank,  and  in  the  case  of  a square  matrix  coincides 
with  nonsingularity  (see  Exercise  FS.M50). 

The  properties  of  the  matrix  L described  by  this  theorem  can  be  explained 
informally  as  follows.  A column  vector  y £ Cm  is  in  the  column  space  of  A if  the 
linear  system  £S(A,  y)  is  consistent  (Theorem  CSCS).  By  Theorem  RCLS,  the 
reduced  row-echelon  form  of  the  augmented  matrix  [A  | y]  of  a consistent  system 
will  have  zeros  in  the  bottom  m — r locations  of  the  last  column.  By  Theorem  PEEF 
this  final  column  is  the  vector  Jy  and  so  should  then  have  zeros  in  the  final  m — r 
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locations.  But  since  L comprises  the  final  m — r rows  of  J , this  condition  is  expressed 
by  saying  y £ Af{L). 

Additionally,  the  rows  of  J are  the  scalars  in  linear  combinations  of  the  rows 
of  A that  create  the  rows  of  B.  That  is,  the  rows  of  J record  the  net  effect  of  the 
sequence  of  row  operations  that  takes  A to  its  reduced  row-echelon  form,  B.  This 
can  be  seen  in  the  equation  J A = B (Theorem  PEEF).  As  such,  the  rows  of  L are 
scalars  for  linear  combinations  of  the  rows  of  A that  yield  zero  rows.  But  such  linear 
combinations  are  precisely  the  elements  of  the  left  null  space.  So  any  element  of  the 
row  space  of  L is  also  an  element  of  the  left  null  space  of  A. 

We  will  now  illustrate  Theorem  FS  with  a few  examples. 


Example  FS1  Four  subsets,  no.  1 

In  Example  SEEF  we  found  the  five  relevant  submatrices  of  the  extended  echelon 
form  for  the  matrix 

1—1—2  7 1 6 ' 

-6  2 -4  -18  -3  -26 

4 -1  4 10  2 17 

3-12  9 1 12 


A = 


To  apply  Theorem  FS  we  only  need  C and  L 

0 0 2 1 0 3 ' 

C = 0 0 4-6  0 -1 

.0  0 0 0 0 2. 

Then  we  use  Theorem  FS  to  obtain 

l~2 

AT (A)  = A f(C)  = ( 


K(A)  = K(C)  = 


C(A)=Jf(L)  = 


T 

' 0 ' 

'O' 

\ 

0 

1 

0 

2 

4 

0 

1 

J 

-6 

5 

0 

> 

0 

0 

1 

_3 

-1 

2 

j 

2' 

2' 

-1 

'-1' 

'-3' 

6 

1 

0 

0 

1 

1 

5 

0 

0 

-2 

0 

1 

> 

L=[0  2 2 1] 


Theorem  BNS 


Theorem  BRS 


Theorem  BNS 
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C(A)  = TZ(L)  = 


Boom! 


Theorem  BRS 


A 


Example  FS2  Four  subsets,  no.  2 

Now  let  us  return  to  the  matrix  A that  we  used  to  motivate  this  section  in  Example 
CSANS, 


' 10 

0 

3 

8 

7 ' 

-16 

-1 

-4 

-10 

-13 

-6 

1 

-3 

-6 

-6 

0 

2 

-2 

-3 

-2 

3 

0 

1 

2 

3 

-1 

-1 

1 

1 

0 

We  form  the  matrix  M by  adjoining  the  6x6  identity  matrix  Iq, 


10 

0 

3 

8 

7 

1 

0 

0 

0 

0 

o' 

-16 

-1 

-4 

-10 

-13 

0 

1 

0 

0 

0 

0 

-6 

1 

-3 

-6 

-6 

0 

0 

1 

0 

0 

0 

0 

2 

-2 

-3 

-2 

0 

0 

0 

1 

0 

0 

3 

0 

1 

2 

3 

0 

0 

0 

0 

1 

0 

-1 

-1 

1 

1 

0 

0 

0 

0 

0 

0 

1 

and  row-reduce  to  obtain  N 

"0  0 0 0 2 0 0 1 -1  2 -l' 

0 0 0 0 -3  0 0 -2  3 -3  3 

N_  00001001133 
0 0 0 0-2  0 0 -2  1 -4  0 

0000  000  3 -1  3 1 

_ 0 0 0 0 0 0 0 -2  1 1 -1. 

To  find  the  four  subsets  for  A,  we  only  need  identify  the  4x5  matrix  C and  the 
2x6  matrix  L, 


[0 

0 

0 

0 

2 ' 

0 

0 

0 

0 -3 

L= 

[0 

0 

0 

0 

0 

0 

1 

. 0 

0 

. 0 

0 

0 

0 -2j 
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Then  we  apply  Theorem  FS, 


A f(A)  = A f(C) 


K(A)  = K{C) 


C(A)=N(L) 


C{A)  = TZ(L) 


Theorem  BNS 


Theorem  BRS 


Theorem  BNS 


Theorem  BRS 


A 


The  next  example  is  just  a bit  different  since  the  matrix  has  more  rows  than 
columns,  and  a trivial  null  space. 


Example  FSAG  Four  subsets,  Archetype  G 

Archetype  G and  Archetype  H are  both  systems  of  m = 5 equations  in  n = 2 
variables.  They  have  identical  coefficient  matrices,  which  we  will  denote  here  as  the 
matrix  G, 


G = 


‘ 2 
-1 
3 
3 
6 


3 ' 

4 
10 
-1 

9 


Adjoin  the  5x5  identity  matrix,  Is,  to  form 


M = 


' 2 
-1 
3 
3 
6 


3 10  0 

4 0 10 

10  0 0 1 

-10  0 0 
9 0 0 0 


0 O' 
0 0 
0 0 

1 0 
0 1 
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This  row-reduces  to 


N = 


'0  0 0 0 0 

0 0 0 0 0 

0 0 0 0 0 

0 0 0 0 0 

0 0 0 0 0 


0 

11 

2_ 

11 

0 

1 

1 


0 
33 
0 
11 
_ 1 
3 

_ 1 
3 

-1 


The  first  n = 2 columns  contain  r = 2 leading  l’s,  so  we  obtain  C as  the  2x2 
identity  matrix  and  extract  L from  the  final  m — r = 3 rows  in  the  final  m = 5 
columns. 


C = 


[E  o 
o 0 


Then  we  apply  Theorem 

A f{G)  = A f(C)  = (0) 
K{G)  = K(C)  = /{ 


C(G)  = Af(L)  = 


C(G)  = TZ(L)  = 


L = 


0 

0 

0 


r ° 

r ° i 

i 

0 

0 

i 

1 

i 

1 

L 3- 

-i 

- 

’ 0 1 

° i 

3 

0 

0 

1 

3 

1 

-1 

-1 

0 

0 

0 


0 0 
0 1 
0 1 


1- 

3 

1 

3 


-1 


Theorem  BNS 
Theorem  BRS 


Theorem  BNS 


Theorem  BRS 


As  mentioned  earlier,  Archetype  G is  consistent,  while  Archetype  H is  inconsistent. 
See  if  you  can  write  the  two  different  vectors  of  constants  from  these  two  archetypes 
as  linear  combinations  of  the  two  vectors  that  form  the  spanning  set  for  C(G).  How 
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about  the  two  columns  of  G?  Can  you  write  each  individually  as  a linear  combination 
of  the  two  vectors  that  form  the  spanning  set  for  C(G)1  They  must  be  in  the  column 
space  of  G also.  Are  your  answers  unique?  Do  you  notice  anything  about  the  scalars 
that  appear  in  the  linear  combinations  you  are  forming?  A 

Example  COV  and  Example  CSROI  each  describes  the  column  space  of  the 
coefficient  matrix  from  Archetype  I as  the  span  of  a set  of  r = 3 linearly  independent 
vectors.  It  is  no  accident  that  these  two  different  sets  both  have  the  same  size.  If  we 
(you?)  were  to  calculate  the  column  space  of  this  matrix  using  the  null  space  of  the 
matrix  L from  Theorem  FS  then  we  would  again  find  a set  of  3 linearly  independent 
vectors  that  span  the  range.  More  on  this  later. 

So  we  have  three  different  methods  to  obtain  a description  of  the  column  space  of 
a matrix  as  the  span  of  a linearly  independent  set.  Theorem  BCS  is  sometimes  useful 
since  the  vectors  it  specifies  are  equal  to  actual  columns  of  the  matrix.  Theorem  BRS 
and  Theorem  CSRST  combine  to  create  vectors  with  lots  of  zeros,  and  strategically 
placed  l’s  near  the  top  of  the  vector.  Theorem  FS  and  the  matrix  L from  the 
extended  echelon  form  gives  us  a third  method,  which  tends  to  create  vectors  with 
lots  of  zeros,  and  strategically  placed  l’s  near  the  bottom  of  the  vector.  If  we  do  not 
care  about  linear  independence  we  can  also  appeal  to  Definition  CSM  and  simply 
express  the  column  space  as  the  span  of  all  the  columns  of  the  matrix,  giving  us  a 
fourth  description. 

With  Theorem  CSRST  and  Definition  RSM,  we  can  compute  column  spaces 
with  theorems  about  row  spaces,  and  we  can  compute  row  spaces  with  theorems 
about  column  spaces,  but  in  each  case  we  must  transpose  the  matrix  first.  At  this 
point  you  may  be  overwhelmed  by  all  the  possibilities  for  computing  column  and 
row  spaces.  Diagram  CSRST  is  meant  to  help.  For  both  the  column  space  and  row 
space,  it  suggests  four  techniques.  One  is  to  appeal  to  the  definition,  another  yields  a 
span  of  a linearly  independent  set,  and  a third  uses  Theorem  FS.  A fourth  suggests 
transposing  the  matrix  and  the  dashed  line  implies  that  then  the  companion  set  of 
techniques  can  be  applied.  This  can  lead  to  a bit  of  silliness,  since  if  you  were  to 
follow  the  dashed  lines  twice  you  would  transpose  the  matrix  twice,  and  by  Theorem 
TT  would  accomplish  nothing  productive. 
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Definition  CSM 
Theorem  BCS 
Theorem  FS,  A f(L) 
Theorem  CSRST,  7Z(At)  ^ 


Definition  RSM,  C(At) 
Theorem  FS,  71(C) 
Theorem  BRS 
Definition  RSM 


Diagram  CSRST:  Column  Space  and  Row  Space  Techniques 

Although  we  have  many  ways  to  describe  a column  space,  notice  that  one  tempting 
strategy  will  usually  fail.  It  is  not  possible  to  simply  row-reduce  a matrix  directly 
and  then  use  the  columns  of  the  row-reduced  matrix  as  a set  whose  span  equals 
the  column  space.  In  other  words,  row  operations  do  not  preserve  column  spaces 
(however  row  operations  do  preserve  row  spaces,  Theorem  REMRS).  See  Exercise 
CRS.M21. 


Reading  Questions 


1.  Find  a nontrivial  element  of  the  left  null  space  of  A. 


A = 


2 

-1 

0 


1 -3  4 

-1  2 -1 
-112 


2.  Find  the  matrices  C and  L in  the  extended  echelon  form  of  A. 


A = 


-9 

2 

-5 


5 

-1 

3 


-3 

1 

-1 


3.  Why  is  Theorem  FS  a great  conclusion  to  Chapter  M? 


Exercises 

C20  Example  FSAG  concludes  with  several  questions.  Perform  the  analysis  suggested  by 
these  questions. 
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C25^  Given  the  matrix  A below,  use  the  extended  echelon  form  of  A to  answer  each  part 
of  this  problem.  In  each  part,  find  a linearly  independent  set  of  vectors,  S,  so  that  the  span 

3 -f 
1 1 
5 -1 

-2  0 _ 

1.  The  row  space  of  A,  1Z(A). 

2.  The  column  space  of  A,  C(A). 

3.  The  null  space  of  A , JV(A). 

4.  The  left  null  space  of  A,  C(A). 

C26^  For  the  matrix  D below  use  the  extended  echelon  form  to  find: 

1.  A linearly  independent  set  whose  span  is  the  column  space  of  D. 

2.  A linearly  independent  set  whose  span  is  the  left  null  space  of  D. 


7 

-11 

-19 

-15" 

6 

10 

18 

14 

3 

5 

9 

7 

-1 

-2 

-4 

-3 

C41  The  following  archetypes  are  systems  of  equations.  For  each  system,  write  the  vector 
of  constants  as  a linear  combination  of  the  vectors  in  the  span  construction  for  the  column 
space  provided  by  Theorem  FS  and  Theorem  BNS  (these  vectors  are  listed  for  each  of  these 
archetypes) . 

Archetype  A,  Archetype  B,  Archetype  C,  Archetype  D,  Archetype  E,  Archetype  F,  Archetype 
G,  Archetype  H,  Archetype  I,  Archetype  J 

C43  The  following  archetypes  are  either  matrices  or  systems  of  equations  with  coefficient 
matrices.  For  each  matrix,  compute  the  extended  echelon  form  N and  identify  the  matrices 
C and  L.  Using  Theorem  FS,  Theorem  BNS  and  Theorem  BRS  express  the  null  space,  the 
row  space,  the  column  space  and  left  null  space  of  each  coefficient  matrix  as  a span  of  a 
linearly  independent  set. 

Archetype  A,  Archetype  B,  Archetype  C,  Archetype  D/Archetype  E,  Archetype  F,  Archetype 
G/Archetype  H,  Archetype  I,  Archetype  J,  Archetype  K,  Archetype  L 

C60'  For  the  matrix  B below,  find  sets  of  vectors  whose  span  equals  the  column  space  of 
B ( C(B ))  and  which  individually  meet  the  following  extra  requirements. 

1.  The  set  illustrates  the  definition  of  the  column  space. 


of  S',  ( S ),  equals  the  specified  set  of  vectors. 

r-5 


3 
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2.  The  set  is  linearly  independent  and  the  members  of  the  set  are  columns  of  B. 

3.  The  set  is  linearly  independent  with  a “nice  pattern  of  zeros  and  ones”  at  the  top  of 
each  vector. 

4.  The  set  is  linearly  independent  with  a “nice  pattern  of  zeros  and  ones”  at  the  bottom 
of  each  vector. 


B = 


2 

1 

-1 


3 1 1 

1 0 1 
2 3-4 


C61 ' Let  A be  the  matrix  below,  and  find  the  indicated  sets  with  the  requested  properties. 


A = 


2 

-5 

1 


-1 

3 

1 


5 -3 

-12  7 

4 -3 


1.  A linearly  independent  set  S so  that  C(A)  = (S)  and  S is  composed  of  columns  of  A. 

2.  A linearly  independent  set  S so  that  C(A ) = (S)  and  the  vectors  in  S have  a nice 
pattern  of  zeros  and  ones  at  the  top  of  the  vectors. 

3.  A linearly  independent  set  S so  that  C(A ) = (S)  and  the  vectors  in  S have  a nice 
pattern  of  zeros  and  ones  at  the  bottom  of  the  vectors. 

4.  A linearly  independent  set  S so  that  7 Z(A)  = ( S ). 


M50  Suppose  that  A is  a nonsingular  matrix.  Extend  the  four  conclusions  of  Theorem 
FS  in  this  special  case  and  discuss  connections  with  previous  results  (such  as  Theorem 
NME4). 

M51  Suppose  that  A is  a singular  matrix.  Extend  the  four  conclusions  of  Theorem  FS  in 
this  special  case  and  discuss  connections  with  previous  results  (such  as  Theorem  NME4). 


Chapter  VS 
Vector  Spaces 


We  now  have  a computational  toolkit  in  place  and  so  we  can  begin  our  study  of 
linear  algebra  at  a more  theoretical  level. 

Linear  algebra  is  the  study  of  two  fundamental  objects,  vector  spaces  and  linear 
transformations  (see  Chapter  LT).  This  chapter  will  focus  on  the  former.  The  power 
of  mathematics  is  often  derived  from  generalizing  many  different  situations  into  one 
abstract  formulation,  and  that  is  exactly  what  we  will  be  doing  throughout  this 
chapter. 


Section  VS 
Vector  Spaces 

In  this  section  we  present  a formal  definition  of  a vector  space,  which  will  lead  to  an 
extra  increment  of  abstraction.  Once  defined,  we  study  its  most  basic  properties. 

Subsection  VS 
Vector  Spaces 

Here  is  one  of  the  two  most  important  definitions  in  the  entire  course. 

Definition  VS  Vector  Space 

Suppose  that  V is  a set  upon  which  we  have  defined  two  operations:  (1)  vector 
addition,  which  combines  two  elements  of  V and  is  denoted  by  “+” , and  (2)  scalar 
multiplication,  which  combines  a complex  number  with  an  element  of  V and  is 
denoted  by  juxtaposition.  Then  V,  along  with  the  two  operations,  is  a vector  space 
over  C if  the  following  ten  properties  hold. 

• AC  Additive  Closure 

If  u,  v e V,  then  u + v £ V. 
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• SC  Scalar  Closure 

If  a £ C and  u £ V,  then  cm  £ V. 

• C Commutativity 

If  u,  v £ V,  then  u + v = v + u. 

• AA  Additive  Associativity 

If  u,  v,  w £ V,  then  u + (v  + w)  = (u  + v)  + w. 

• Z Zero  Vector 

There  is  a vector,  0,  called  the  zero  vector,  such  that  u + 0 = u for  all  u £ V. 

• AI  Additive  Inverses 

If  u £ V,  then  there  exists  a vector  — u £ V so  that  u + (— u)  = 0. 

• SMA  Scalar  Multiplication  Associativity 

If  a,  /3  £ C and  u £ V,  then  a(/3u)  = (afl)u. 

• DVA  Distributivity  across  Vector  Addition 

If  a £ C and  u,  v £ V,  then  a(u  + v)  = cm  + qv. 

• DSA  Distributivity  across  Scalar  Addition 

If  a,  /3  £ C and  u £ V,  then  (a  + /3)u  = qu  + /3u. 

• O One 

If  u £ V,  then  lu  = u. 

The  objects  in  V are  called  vectors,  no  matter  what  else  they  might  really  be, 
simply  by  virtue  of  being  elements  of  a vector  space.  □ 

Now,  there  are  several  important  observations  to  make.  Many  of  these  will  be 
easier  to  understand  on  a second  or  third  reading,  and  especially  after  carefully 
studying  the  examples  in  Subsection  VS.EVS. 

An  axiom  is  often  a “self-evident”  truth.  Something  so  fundamental  that  we 
all  agree  it  is  true  and  accept  it  without  proof.  Typically,  it  would  be  the  logical 
underpinning  that  we  would  begin  to  build  theorems  upon.  Some  might  refer  to 
the  ten  properties  of  Definition  VS  as  axioms,  implying  that  a vector  space  is  a 
very  natural  object  and  the  ten  properties  are  the  essence  of  a vector  space.  We 
will  instead  emphasize  that  we  will  begin  with  a definition  of  a vector  space.  After 
studying  the  remainder  of  this  chapter,  you  might  return  here  and  remind  yourself 
how  all  our  forthcoming  theorems  and  definitions  rest  on  this  foundation. 

As  we  will  see  shortly,  the  objects  in  V can  be  anything , even  though  we  will 
call  them  vectors.  We  have  been  working  with  vectors  frequently,  but  we  should 
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stress  here  that  these  have  so  far  just  been  column  vectors  — scalars  arranged  in  a 
columnar  list  of  fixed  length.  In  a similar  vein,  you  have  used  the  symbol  “+”  for 
many  years  to  represent  the  addition  of  numbers  (scalars).  We  have  extended  its 
use  to  the  addition  of  column  vectors  and  to  the  addition  of  matrices,  and  now  we 
are  going  to  recycle  it  even  further  and  let  it  denote  vector  addition  in  any  possible 
vector  space.  So  when  describing  a new  vector  space,  we  will  have  to  define  exactly 
what  “+”  is.  Similar  comments  apply  to  scalar  multiplication.  Conversely,  we  can 
define  our  operations  any  way  we  like,  so  long  as  the  ten  properties  are  fulfilled  (see 
Example  CVS). 

In  Definition  VS,  the  scalars  do  not  have  to  be  complex  numbers.  They  can  come 
from  what  are  called  in  more  advanced  mathematics,  “fields” . Examples  of  fields  are 
the  set  of  complex  numbers,  the  set  of  real  numbers,  the  set  of  rational  numbers, 
and  even  the  finite  set  of  “binary  numbers”,  {0,  1}.  There  are  many,  many  others. 
In  this  case  we  would  call  V a vector  space  over  (the  field)  F. 

A vector  space  is  composed  of  three  objects,  a set  and  two  operations.  Some 
would  explicitly  state  in  the  definition  that  V must  be  a nonempty  set,  but  we  can 
infer  this  from  Property  Z,  since  the  set  cannot  be  empty  and  contain  a vector  that 
behaves  as  the  zero  vector.  Also,  we  usually  use  the  same  symbol  for  both  the  set 
and  the  vector  space  itself.  Do  not  let  this  convenience  fool  you  into  thinking  the 
operations  are  secondary! 

This  discussion  has  either  convinced  you  that  we  are  really  embarking  on  a new 
level  of  abstraction,  or  it  has  seemed  cryptic,  mysterious  or  nonsensical.  You  might 
want  to  return  to  this  section  in  a few  days  and  give  it  another  read  then.  In  any 
case,  let  us  look  at  some  concrete  examples  now. 

Subsection  EVS 
Examples  of  Vector  Spaces 

Our  aim  in  this  subsection  is  to  give  you  a storehouse  of  examples  to  work  with,  to 
become  comfortable  with  the  ten  vector  space  properties  and  to  convince  you  that 
the  multitude  of  examples  justifies  (at  least  initially)  making  such  a broad  definition 
as  Definition  VS.  Some  of  our  claims  will  be  justified  by  reference  to  previous 
theorems,  we  will  prove  some  facts  from  scratch,  and  we  will  do  one  nontrivial 
example  completely.  In  other  places,  our  usual  thoroughness  will  be  neglected,  so 
grab  paper  and  pencil  and  play  along. 

Example  VSCV  The  vector  space  Cm 

Set:  Cm,  all  column  vectors  of  size  to,  Definition  VSCV. 

Equality:  Entry-wise,  Definition  CVE. 

Vector  Addition:  The  “usual”  addition,  given  in  Definition  CVA. 

Scalar  Multiplication:  The  “usual”  scalar  multiplication,  given  in  Definition 
CVSM. 
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Does  this  set  with  these  operations  fulfill  the  ten  properties?  Yes.  And  by  design 
all  we  need  to  do  is  quote  Theorem  VSPCV.  That  was  easy.  A 

Example  VSM  The  vector  space  of  matrices,  Mmn 

Set:  Mmn,  the  set  of  all  matrices  of  size  m x n and  entries  from  C,  Definition  VSM. 
Equality:  Entry-wise,  Definition  ME. 

Vector  Addition:  The  “usual”  addition,  given  in  Definition  MA. 

Scalar  Multiplication:  The  “usual”  scalar  multiplication,  given  in  Definition  MSM. 
Does  this  set  with  these  operations  fulfill  the  ten  properties?  Yes.  And  all  we 
need  to  do  is  quote  Theorem  VSPM.  Another  easy  one  (by  design).  A 

So,  the  set  of  all  matrices  of  a fixed  size  forms  a vector  space.  That  entitles  us  to 
call  a matrix  a vector,  since  a matrix  is  an  element  of  a vector  space.  For  example,  if 
A,  B £ M34  then  we  call  A and  B “vectors,”  and  we  even  use  our  previous  notation 
for  column  vectors  to  refer  to  A and  B.  So  we  could  legitimately  write  expressions 
like 

u+v=A+B=B+A=v+u 

This  could  lead  to  some  confusion,  but  it  is  not  too  great  a danger.  But  it  is  worth 
comment. 

The  previous  two  examples  may  be  less  than  satisfying.  We  made  all  the  relevant 
definitions  long  ago.  And  the  required  verifications  were  all  handled  by  quoting  old 
theorems.  However,  it  is  important  to  consider  these  two  examples  first.  We  have 
been  studying  vectors  and  matrices  carefully  (Chapter  V,  Chapter  M),  and  both 
objects,  along  with  their  operations,  have  certain  properties  in  common,  as  you 
may  have  noticed  in  comparing  Theorem  VSPCV  with  Theorem  VSPM.  Indeed, 
it  is  these  two  theorems  that  motivate  us  to  formulate  the  abstract  definition  of  a 
vector  space,  Definition  VS.  Now,  if  we  prove  some  general  theorems  about  vector 
spaces  (as  we  will  shortly  in  Subsection  VS.VSP),  we  can  then  instantly  apply  the 
conclusions  to  both  Cm  and  Mrnn . Notice  too,  how  we  have  taken  six  definitions  and 
two  theorems  and  reduced  them  down  to  two  examples.  With  greater  generalization 
and  abstraction  our  old  ideas  get  downgraded  in  stature. 

Let  us  look  at  some  more  examples,  now  considering  some  new  vector  spaces. 

Example  VSP  The  vector  space  of  polynomials,  Pn 

Set:  Pn,  the  set  of  all  polynomials  of  degree  n or  less  in  the  variable  x with  coefficients 
from  C. 

Equality: 

ao  -f-  a\X  T ci2X“  T • • • T an xn  — bo  V b\x  T b2X ^ T • • • T bnxn 
if  and  only  if  cq  = bi  for  0 < i < n 

Vector  Addition: 

(0 0 + CLiX  + U22'2  T ■ ■ ■ + anxn)  + (60  T biX  + b-2X^  + • • • + bnxn ) = 
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(cto  ~b  bo)  + (di  + b±)x  + (d2  + b2)x 2 + • • • + ( an  + bn)xn 
Scalar  Multiplication: 

a(cio  + aiX  + CI2X2  + • • • + anxn)  = (aao)  + (aai)x  + (aa,2)x2  + ■ ■ ■ + (aan)xn 

This  set,  with  these  operations,  will  fulfill  the  ten  properties,  though  we  will  not 
work  all  the  details  here.  However,  we  will  make  a few  comments  and  prove  one  of 
the  properties.  First,  the  zero  vector  (Property  Z)  is  what  you  might  expect,  and 
you  can  check  that  it  has  the  required  property. 

0 = 0 + Ox  + Ox2  + • • • + 0xn 

The  additive  inverse  (Property  AI)  is  also  no  surprise,  though  consider  how  we 
have  chosen  to  write  it. 

- (d0  + a\X  + a2x2  + ■ ■ ■ + anxn)  = (— d0)  + (— ai)x  + (~a2)x2  -\ + (— an)xn 

Now  let  us  prove  the  associativity  of  vector  addition  (Property  AA).  This  is  a 
bit  tedious,  though  necessary.  Throughout,  the  plus  sign  (“+”)  does  triple-duty.  You 
might  ask  yourself  what  each  plus  sign  represents  as  you  work  through  this  proof. 

u+(v  + w) 

= (d0  + aix  -\ 1-  anxn)  + ((b0  + hx  b bnxn)  + (c0  + c\x  H b cnxn )) 

= (do  + dice  + • • • + anxn)  + ((bo  + Co)  + (b±  + Ci)x  + ■ • • + (bn  + cn)xn ) 

= (do  + (bo  + Co))  + (di  + (b\  + a))x  + • • • + (dn  + (bn  + cn))xn 
= ((do  + bo)  + Co)  + ((di  + bi)  + ci)x  + • • • + (( an  + bn ) + cn)xn 
= ((do  + bo)  + (di  + b\)x  + • • • + (dn  + bn)xn)  + (co  + c\X  + • • • + cnxn) 

= ((do  + a,iX  + • • • + anxn)  + (bo  + b\X  + • • • + bnxn))  + (co  + C\x  + • • • + cnxn) 
= (u  + v)  + w 

Notice  how  it  is  the  application  of  the  associativity  of  the  (old)  addition  of 
complex  numbers  in  the  middle  of  this  chain  of  equalities  that  makes  the  whole  proof 
happen.  The  remainder  is  successive  applications  of  our  (new)  definition  of  vector 
(polynomial)  addition.  Proving  the  remainder  of  the  ten  properties  is  similar  in  style 
and  tedium.  You  might  try  proving  the  commutativity  of  vector  addition  (Property 
C),  or  one  of  the  distributivity  properties  (Property  DVA,  Property  DSA).  A 

Example  VSIS  The  vector  space  of  infinite  sequences 
Set:  C°°  = { (co,  ci,  c2,  c3,  . . .)|  c*  e <C,  i £ N}. 

Equality: 

(co,  Ci,  c2,  ■ • ■)  = (do,  di,  g?2,  ■ • •)  if  and  only  if  Cj  = di  for  all  z > 0 
Vector  Addition: 

(c0,  ci,  c2,  . . .)  + (d0l  di,  d2,  ...)  = (c0  + d0,  c\  + di,  c2  + d2,  ■ ■ •) 
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Scalar  Multiplication: 

a(co,  ci,  C2,  C3,  . . .)  = (aco,  olc\,  aC2,  ac 3,  . . .) 

This  should  remind  you  of  the  vector  space  Cm,  though  now  our  lists  of  scalars  are 
written  horizontally  with  commas  as  delimiters  and  they  are  allowed  to  be  infinite  in 
length.  What  does  the  zero  vector  look  like  (Property  Z)?  Additive  inverses  (Property 
AI)?  Can  you  prove  the  associativity  of  vector  addition  (Property  AA)?  A 

Example  VSF  The  vector  space  of  functions 
Let  X be  any  set. 

Set:  F = {/|/:  X^C}. 

Equality:  / = g if  and  only  if  f(x)  = g(x ) for  all  x £ X. 

Vector  Addition:  / + g is  the  function  with  outputs  defined  by  (/  + g)(x)  = 
f(x)+g(x). 

Scalar  Multiplication:  af  is  the  function  with  outputs  defined  by  ( af)(x ) = 
af{x). 

So  this  is  the  set  of  all  functions  of  one  variable  that  take  elements  of  the  set  X 
to  a complex  number.  You  might  have  studied  functions  of  one  variable  that  take  a 
real  number  to  a real  number,  and  that  might  be  a more  natural  set  to  use  as  A'. 
But  since  we  are  allowing  our  scalars  to  be  complex  numbers,  we  need  to  specify 
that  the  range  of  our  functions  is  the  complex  numbers.  Study  carefully  how  the 
definitions  of  the  operation  are  made,  and  think  about  the  different  uses  of  “+”  and 
juxtaposition.  As  an  example  of  what  is  required  when  verifying  that  this  is  a vector 
space,  consider  that  the  zero  vector  (Property  Z)  is  the  function  2 whose  definition 
is  z(x)  = 0 for  every  input  x £ X. 

Vector  spaces  of  functions  are  very  important  in  mathematics  and  physics,  where 
the  field  of  scalars  may  be  the  real  numbers,  so  the  ranges  of  the  functions  can  in 
turn  also  be  the  set  of  real  numbers.  A 

Here  is  a unique  example. 

Example  VSS  The  singleton  vector  space 
Set:  Z = {z}. 

Equality:  Huh? 

Vector  Addition:  z + z = z. 

Scalar  Multiplication:  az  = z. 

This  should  look  pretty  wild.  First,  just  what  is  z?  Column  vector,  matrix, 
polynomial,  sequence,  function?  Mineral,  plant,  or  animal?  We  aren’t  saying!  z just 
is.  And  we  have  definitions  of  vector  addition  and  scalar  multiplication  that  are 
sufficient  for  an  occurrence  of  either  that  may  come  along. 

Our  only  concern  is  if  this  set,  along  with  the  definitions  of  two  operations,  fulfills 
the  ten  properties  of  Definition  VS.  Let  us  check  associativity  of  vector  addition 
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(Property  AA).  For  all  u,  v,  w £ Z, 

u + (v  + w)  = z + (z  + z) 

= z + z 
= (z  + z)  + z 
= (u  + v)  + w 

What  is  the  zero  vector  in  this  vector  space  (Property  Z)?  With  only  one  element 
in  the  set,  we  do  not  have  much  choice.  Is  z = 0?  It  appears  that  z behaves  like  the 
zero  vector  should,  so  it  gets  the  title.  Maybe  now  the  definition  of  this  vector  space 
does  not  seem  so  bizarre.  It  is  a set  whose  only  element  is  the  element  that  behaves 
like  the  zero  vector,  so  that  lone  element  is  the  zero  vector.  A 

Perhaps  some  of  the  above  definitions  and  verifications  seem  obvious  or  like 
splitting  hairs,  but  the  next  example  should  convince  you  that  they  are  necessary. 
We  will  study  this  one  carefully.  Ready?  Check  your  preconceptions  at  the  door. 

Example  CVS  The  crazy  vector  space 
Set:  C = { (xi,  x2)|  Xi,  x2  £ C}. 

Vector  Addition:  (xi,  x2)  + (t/i,  y2)  = (xi  + y\  + 1,  x2  + y2  + 1). 

Scalar  Multiplication:  a(x±,  x2)  = (ax\  + a — 1,  ax2  + a — 1). 

Now,  the  first  thing  I hear  you  say  is  “You  can’t  do  that!”  And  my  response  is, 
“Oh  yes,  I can!”  I am  free  to  define  my  set  and  my  operations  any  way  I please.  They 
may  not  look  natural,  or  even  useful,  but  we  will  now  verify  that  they  provide  us 
with  another  example  of  a vector  space.  And  that  is  enough.  If  you  are  adventurous, 
you  might  try  first  checking  some  of  the  properties  yourself.  What  is  the  zero  vector? 
Additive  inverses?  Can  you  prove  associativity?  Ready,  here  we  go. 

Property  AC,  Property  SC:  The  result  of  each  operation  is  a pair  of  complex 
numbers,  so  these  two  closure  properties  are  fulfilled. 

Property  C: 

u + v = (xi,  x2)  + 0/i,  y2)  = (xi  +y1  + 1,  x2  + y-2  + 1) 

= (yi  + Xi  + 1,  y2  + x2  + 1)  = 0/i,  y2)  + (xi,  x2) 

= V + u 

Property  AA: 

u + (v  + w)  = (xi,  x2)  + (0/1,  y2)  + {z\,  z2)) 

= (xi,  x2)  + (yi  + zi  + 1,  y2  + 22  + 1) 

= (xi  + (yi  + Zi  + 1)  + 1,  x2  + {y2  + z2  + 1)  + 1) 

= ( X\  + yi  + Z\  + 2,  x2  + y2  + z2  + 2) 

= ((xi  + yi  + 1)  + z\  + 1,  (x2  + 2/2  + 1)  + z2  + 1) 

= (xi  + 2/i  + 1,  x2  + y2  + 1)  + (zi,  z2) 
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= ((£1,  x2)  + (2/1,  2/2))  + {zi,  z2) 

= (u  + v)  + w 

Property  Z:  The  zero  vector  is  . . . 0 = (—1,  —1).  Now  I hear  you  say,  “No,  no, 
that  can’t  be,  it  must  be  (0,  0)!”.  Indulge  me  for  a moment  and  let  us  check  my 
proposal. 

u + 0 = (xi,  x2)  + (— 1,  —1)  = (x\  + (—1)  + 1,  x2  + (—1)  + 1)  = (xi,  x2 ) = u 
Feeling  better?  Or  worse? 

Property  AI:  For  each  vector,  u,  we  must  locate  an  additive  inverse,  — u.  Here  it 
is,  — (xi,  x2)  = (— X\  — 2,  —x2  — 2).  As  odd  as  it  may  look,  I hope  you  are  withholding 
judgment.  Check: 

u + (-u)  = (xi,  x2)  + (-£1  - 2,  —x2  - 2) 

= (£1  + (— £1  — 2)  + 1,  —£2  + (x2  — 2)  + 1)  = (—1,  —1)  = 0 
Property  SMA: 

a(/3u)  = a(/3(xi,  x2)) 

= a(/3x  1 + /3  - 1,  /3x2  + /3  — 1) 

= (a(/3x  1 + P — 1)  + a — 1,  a(/3x2  + p — 1)  + a - 1) 

= ((a(3x  1 + a/3  — a)  + a — 1,  (a(3x 2 + a/3  — a)  + a — 1) 

= (a/3x  1 + a/3  — 1,  af3x2  + a/3  — 1) 

= (a/3)(£i,  £2) 

= (a/3)u 

Property  DVA:  If  you  have  hung  on  so  far,  here  is  where  it  gets  even  wilder.  In 
the  next  two  properties  we  mix  and  mash  the  two  operations. 

a(u  + v) 

= a ((£1,  £2)  + (2/1,  y2 )) 

= aOi  + yi  + 1,  £2  + y2  + 1) 

= (a(xi  + yi  + 1)  + a — 1,  a(x2  + 2/2  + 1)  + a — 1) 

= (axi  + ayi  + a + a — 1,  a£2  + ay2  + a + a — 1) 

= (axi  + a — 1 + ay  1 + a — 1 + 1,  ax2  + a — 1 + ay2  + a — 1 + 1) 

= ((axi  + a - 1)  + (ayi  + a — 1)  + 1,  (ax2  + a — 1)  + (ay2  + a — 1)  + 1) 

= (axi  + a — 1,  ax'2  + a — 1)  + (ay  1 + a — 1,  ay2  + a — 1) 

= a(xi,  x2)  + a(2/i,  2/2) 

= au  + av 
Property  DSA: 

(a  + /3)u 
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= (a  + /?)(xi,  x2) 

= ((a  + 0)x i + (a  + /?)  — 1,  (a  + 0)x2  + (o;  + /3)  — 1) 

= (. axi  + 0x\  + a + 0 — 1,  ax2  + 0x2  + a + ft  - 1) 

= (axi  + a - 1 + pxi  + 0 — 1 + 1,  ax2  + a — 1 + 0x2  + 0 — 1 + 1) 

= ((ttcci  + a — 1)  + (0x±  + 0 — 1)  + 1,  (otx2  + a — 1)  + {0x2  + 0 — 1)  + 1) 

= {axi  + a - 1,  a£2  + a - 1)  + (0Xi  +0—1,  0x2  +0-1) 

= a{x\,  x2)  + 0(xi,  x2) 

= au  + 0u 

Property  0:  After  all  that,  this  one  is  easy,  but  no  less  pleasing. 

lu  = l(xi,  x2)  = (xi  + 1 - 1,  x2  + 1 - 1)  = (xi,  x2)  = u 

That  is  it,  C is  a vector  space,  as  crazy  as  that  may  seem. 

Notice  that  in  the  case  of  the  zero  vector  and  additive  inverses,  we  only  had  to 
propose  possibilities  and  then  verify  that  they  were  the  correct  choices.  You  might 
try  to  discover  how  you  would  arrive  at  these  choices,  though  you  should  understand 
why  the  process  of  discovering  them  is  not  a necessary  component  of  the  proof  itself. 
A 

Subsection  VSP 
Vector  Space  Properties 

Subsection  VS.EVS  has  provided  us  with  an  abundance  of  examples  of  vector  spaces, 
most  of  them  containing  useful  and  interesting  mathematical  objects  along  with 
natural  operations.  In  this  subsection  we  will  prove  some  general  properties  of 
vector  spaces.  Some  of  these  results  will  again  seem  obvious,  but  it  is  important 
to  understand  why  it  is  necessary  to  state  and  prove  them.  A typical  hypothesis 
will  be  “Let  V be  a vector  space.”  From  this  we  may  assume  the  ten  properties  of 
Definition  VS,  and  nothing  more.  It  is  like  starting  over,  as  we  learn  about  what  can 
happen  in  this  new  algebra  we  are  learning.  But  the  power  of  this  careful  approach  is 
that  we  can  apply  these  theorems  to  any  vector  space  we  encounter  — those  in  the 
previous  examples,  or  new  ones  we  have  not  yet  contemplated.  Or  perhaps  new  ones 
that  nobody  has  ever  contemplated.  We  will  illustrate  some  of  these  results  with 
examples  from  the  crazy  vector  space  (Example  CVS),  but  mostly  we  are  stating 
theorems  and  doing  proofs.  These  proofs  do  not  get  too  involved,  but  are  not  trivial 
either,  so  these  are  good  theorems  to  try  proving  yourself  before  you  study  the  proof 
given  here.  (See  Proof  Technique  P.) 

First  we  show  that  there  is  just  one  zero  vector.  Notice  that  the  properties  only 
require  there  to  be  at  least  one,  and  say  nothing  about  there  possibly  being  more. 
That  is  because  we  can  use  the  ten  properties  of  a vector  space  (Definition  VS)  to 
learn  that  there  can  never  be  more  than  one.  To  require  that  this  extra  condition 
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be  stated  as  an  eleventh  property  would  make  the  definition  of  a vector  space  more 
complicated  than  it  needs  to  be. 

Theorem  ZVU  Zero  Vector  is  Unique 

Suppose  that  V is  a vector  space.  The  zero  vector,  0,  is  unique. 

Proof.  To  prove  uniqueness,  a standard  technique  is  to  suppose  the  existence  of  two 
objects  (Proof  Technique  U).  So  let  0i  and  O2  be  two  zero  vectors  in  V.  Then 

0i  = 0i  + O2  Property  Z for  O2 

= 02  + 0i  Property  C 

= 02  Property  Z for  0j 

This  proves  the  uniqueness  since  the  two  zero  vectors  are  really  the  same.  ■ 
Theorem  AIU  Additive  Inverses  are  Unique 

Suppose  that  V is  a vector  space.  For  each  u £ V,  the  additive  inverse,  — u,  is  unique. 


Proof.  To  prove  uniqueness,  a standard  technique  is  to  suppose  the  existence  of  two 
objects  (Proof  Technique  U).  So  let  — ui  and  U2  be  two  additive  inverses  for  u. 
Then 


-Ui  = -U!  + 0 

= -Ur  + (u  + — u2) 

= (-Ui  + u)  + u2 
= 0 + u2 
= — U2 

So  the  two  additive  inverses  are  really  the  same. 


Property  Z 
Property  AI 
Property  AA 
Property  AI 
Property  Z 


As  obvious  as  the  next  three  theorems  appear,  nowhere  have  we  guaranteed  that 
the  zero  scalar,  scalar  multiplication  and  the  zero  vector  all  interact  this  way.  Until 
we  have  proved  it,  anyway. 

Theorem  ZSSM  Zero  Scalar  in  Scalar  Multiplication 
Suppose  that  V is  a vector  space  and  u G V.  Then  Ou  = 0. 


Proof.  Notice  that  0 is  a scalar,  u is  a vector,  so  Property  SC  says  Ou  is  again  a 
vector.  As  such,  Ou  has  an  additive  inverse,  — (Ou)  by  Property  AI. 


Ou  = 0 + Ou 

= (-(Ou)  + 0u)  +0u 
= - (Ou)  + (Ou  + Ou) 
= — (Ou)  + (0  + 0)u 
= -(Ou)  + 0u 


Property  Z 
Property  AI 
Property  AA 
Property  DSA 
Property  ZCN 
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= 0 


Property  AI 


Here  is  another  theorem  that  looks  like  it  should  be  obvious,  but  is  still  in  need 
of  a proof. 

Theorem  ZVSM  Zero  Vector  in  Scalar  Multiplication 
Suppose  that  V is  a vector  space  and  a £ C.  Then  aO  = 0. 


Proof.  Notice  that  a is  a scalar,  0 is  a vector,  so  Property  SC  means  aO  is  again  a 
vector.  As  such,  aO  has  an  additive  inverse,  — (aO)  by  Property  AI. 


aO  = 0 + aO 

= (— (aO)  + aO)  + aO 
= — (aO)  (aO  -(-  aO) 
= — (aO)  + a (0  + 0) 
= — (aO)  + aO 
= 0 


Property  Z 
Property  AI 
Property  AA 
Property  DVA 
Property  Z 
Property  AI 


Here  is  another  one  that  sure  looks  obvious.  But  understand  that  we  have  chosen 
to  use  certain  notation  because  it  makes  the  theorem’s  conclusion  look  so  nice.  The 
theorem  is  not  true  because  the  notation  looks  so  good;  it  still  needs  a proof.  If  we 
had  really  wanted  to  make  this  point,  we  might  have  used  notation  like  iV  for  the 
additive  inverse  of  u.  Then  we  would  have  written  the  defining  property,  Property 
AI,  as  u + u*  = 0.  This  theorem  would  become  uc  = (— l)u.  Not  really  quite  as 
pretty,  is  it? 

Theorem  AISM  Additive  Inverses  from  Scalar  Multiplication 
Suppose  that  V is  a vector  space  and  u G V.  Then  — u = (— l)u. 

Proof. 


u + 0 

Property  Z 

-u  + Ou 

Theorem  ZSSM 

-u+(l  + (-l))u 

Property  AICN 

-u+(lu+(-l)u) 

Property  DSA 

-u+  (u+  (-l)u) 

Property  0 

(-u  + u)  + (-l)u 

Property  AA 

0 + (— l)u 

Property  AI 

(-l)u 

Property  Z 
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Because  of  this  theorem,  we  can  now  write  linear  combinations  like  6ui  + (— 4)u2 
as  6ui  — 4u2,  even  though  we  have  not  formally  defined  an  operation  called  vector 
subtraction. 

Our  next  theorem  is  a bit  different  from  several  of  the  others  in  the  list.  Rather 
than  making  a declaration  ( “the  zero  vector  is  unique” ) it  is  an  implication  ( “if. . . , 
then. . . ”)  and  so  can  be  used  in  proofs  to  convert  a vector  equality  into  two  possibil- 
ities, one  a scalar  equality  and  the  other  a vector  equality.  It  should  remind  you  of 
the  situation  for  complex  numbers.  If  a,  /3  £ C and  a/3  = 0,  then  a = 0 or  /3  = 0. 
This  critical  property  is  the  driving  force  behind  using  a factorization  to  solve  a 
polynomial  equation. 

Theorem  SMEZV  Scalar  Multiplication  Equals  the  Zero  Vector 

Suppose  that  V is  a vector  space  and  a £ C.  If  au  = 0,  then  either  a = 0 or  u = 0. 


Proof.  We  prove  this  theorem  by  breaking  up  the  analysis  into  two  cases.  The  first 
seems  too  trivial,  and  it  is,  but  the  logic  of  the  argument  is  still  legitimate. 

Case  1.  Suppose  a = 0.  In  this  case  our  conclusion  is  true  (the  first  part  of  the 
either/or  is  true)  and  we  are  done.  That  was  easy. 


Case  2.  Suppose 

a/0. 

u = lu 

Property  O 

= ( -a^l  u 

a^0,<  acroreftype 

\a  j 

= - (au) 

Property  SMA 

a 

= - (0) 

Hypothesis 

a 

= 0 

Theorem  ZVSM 

So  in  this  case,  the  conclusion  is  true  (the  second  part  of  the  either/or  is  true) 
and  we  are  done  since  the  conclusion  was  true  in  each  of  the  two  cases.  ■ 


Example  PCVS  Properties  for  the  Crazy  Vector  Space 

Several  of  the  above  theorems  have  interesting  demonstrations  when  applied  to  the 
crazy  vector  space,  C (Example  CVS).  We  are  not  proving  anything  new  here,  or 
learning  anything  we  did  not  know  already  about  C . It  is  just  plain  fun  to  see  how 
these  general  theorems  apply  in  a specific  instance.  For  most  of  our  examples,  the 
applications  are  obvious  or  trivial,  but  not  with  C. 

Suppose  ueC.  Then,  as  given  by  Theorem  ZSSM, 

Ou  = 0(aq,  x2)  = (Oaq  + 0—1,  0a^2  + 0 — 1)  = (—1,  —1)  = 0 
And  as  given  by  Theorem  ZVSM, 

aO  = a(— 1,  —1)  = (a(  — 1)  + a — 1,  a(— 1)  + a — 1) 
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= {—a  + a — 1,  —a  + a — 1)  = (—1,  —1)  = 0 
Finally,  as  given  by  Theorem  AISM, 

(-l)u  = (— l)(a?i,  x2)  = ((— l)zi  + (-1)  - 1,  (-l)x2  + (-1)  - 1) 
= {—X\  - 2,  —X2  - 2)  = -u 


A 


Subsection  RD 
Recycling  Definitions 

When  we  say  that  V is  a vector  space,  we  then  know  we  have  a set  of  objects  (the 
“vectors”),  but  we  also  know  we  have  been  provided  with  two  operations  (“vector 
addition”  and  “scalar  multiplication”)  and  these  operations  behave  with  these  objects 
according  to  the  ten  properties  of  Definition  VS.  One  combines  two  vectors  and 
produces  a vector,  the  other  takes  a scalar  and  a vector,  producing  a vector  as  the 
result.  So  if  u1;  u2,  u3  £ V then  an  expression  like 

5ui  + 7u2  - 13u3 

would  be  unambiguous  in  any  of  the  vector  spaces  we  have  discussed  in  this  section. 
And  the  resulting  object  would  be  another  vector  in  the  vector  space.  If  you  were 
tempted  to  call  the  above  expression  a linear  combination,  you  would  be  right.  Four 
of  the  definitions  that  were  central  to  our  discussions  in  Chapter  V were  stated  in 
the  context  of  vectors  being  column  vectors,  but  were  purposely  kept  broad  enough 
that  they  could  be  applied  in  the  context  of  any  vector  space.  They  only  rely  on  the 
presence  of  scalars,  vectors,  vector  addition  and  scalar  multiplication  to  make  sense. 
We  will  restate  them  shortly,  unchanged,  except  that  their  titles  and  acronyms  no 
longer  refer  to  column  vectors,  and  the  hypothesis  of  being  in  a vector  space  has 
been  added.  Take  the  time  now  to  look  forward  and  review  each  one,  and  begin 
to  form  some  connections  to  what  we  have  done  earlier  and  what  we  will  be  doing 
in  subsequent  sections  and  chapters.  Specifically,  compare  the  following  pairs  of 
definitions: 

• Definition  LCCV  and  Definition  LC 

• Definition  SSCV  and  Definition  SS 

• Definition  RLDCV  and  Definition  RLD 

• Definition  LICV  and  Definition  LI 


Reading  Questions 
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1.  Comment  on  how  the  vector  space  Cm  went  from  a theorem  (Theorem  VSPCV)  to  an 
example  (Example  VSCV). 

2.  In  the  crazy  vector  space,  C,  (Example  CVS)  compute  the  linear  combination 

2(3,  4)  + ( 6) (1,  2). 

3.  Suppose  that  a?  is  a scalar  and  0 is  the  zero  vector.  Why  should  we  prove  anything  as 
obvious  as  aO  = 0 such  as  we  did  in  Theorem  ZVSM? 


Exercises 


M10  Define  a possibly  new  vector  space  by  beginning  with  the  set  and  vector  addition 
from  C2  (Example  VSCV)  but  change  the  dehnition  of  scalar  multiplication  to 


ax  = 0 = 


a e C,  X e c2 


Prove  that  the  first  nine  properties  required  for  a vector  space  hold,  but  Property  O does 
not  hold. 


This  example  shows  us  that  we  cannot  expect  to  be  able  to  derive  Property  O as  a 
consequence  of  assuming  the  first  nine  properties.  In  other  words,  we  cannot  slim  down  our 
list  of  properties  by  jettisoning  the  last  one,  and  still  have  the  same  collection  of  objects 
qualify  as  vector  spaces. 

MU'  Let  V be  the  set  C2  with  the  usual  vector  addition,  but  with  scalar  multiplication 
dehned  by 

ay 
ax 

Determine  whether  or  not  V is  a vector  space  with  these  operations. 

M12^  Let  V be  the  set  C2  with  the  usual  scalar  multiplication,  but  with  vector  addition 
dehned  by 


X 

y 

x 

V 


+ 


w 


y + w 
X + z 


Determine  whether  or  not  V is  a vector  space  with  these  operations. 

M13*  Let  V be  the  set  M22  with  the  usual  scalar  multiplication,  but  with  addition  dehned 
by  A + B = O 2,2  for  all  2 x 2 matrices  A and  B.  Determine  whether  or  not  V is  a vector 
space  with  these  operations. 

M14^  Let  V be  the  set  M22  with  the  usual  addition,  but  with  scalar  multiplication  dehned 
by  aA  = O 2,2  for  all  2x2  matrices  A and  scalars  a.  Determine  whether  or  not  V is  a 
vector  space  with  these  operations. 

M15f  Consider  the  following  sets  of  3 x 3 matrices,  where  the  symbol  * indicates  the 
position  of  an  arbitrary  complex  number.  Determine  whether  or  not  these  sets  form  vector 
spaces  with  the  usual  operations  of  addition  and  scalar  multiplication  for  matrices. 
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1.  All  matrices  of  the  form 


2.  All  matrices  of  the  form 


3.  All  matrices  of  the  form 


4.  All  matrices  of  the  form 


* * 1 

* 1 * 

1 * * 

* 0 * 

0*0 
* 0 * 

*00 
0*0 
0 0* 

* * * 

0 * * 

0 0* 


(These  are  the  diagonal  matrices.) 

(These  are  the  upper  triangular  matrices.) 


M20'  Explain  why  we  need  to  define  the  vector  space  P„  as  the  set  of  all  polynomials 
with  degree  up  to  and  including  n instead  of  the  more  obvious  set  of  all  polynomials  of 
degree  exactly  n. 


M21'  The  set  of  integers  is  denoted  Z.  Does  the  set  Z2  = | ^ m,n  £ Zj  with  the 
operations  of  standard  addition  and  scalar  multiplication  of  vectors  form  a vector  space? 


T10  Prove  each  of  the  ten  properties  of  Definition  VS  for  each  of  the  following  examples 
of  a vector  space:  Example  VSP,  Example  VSIS,  Example  VSF,  Example  VSS. 


The  next  three  problems  suggest  that  under  the  right  situations  we  can  “cancel.”  In  practice, 
these  techniques  should  be  avoided  in  other  proofs.  Prove  each  of  the  following  statements. 
T211  Suppose  that  V is  a vector  space,  and  u,  v,  w £ V.  If  w + u = w + v,  then 

U = V. 

T22^  Suppose  V is  a vector  space,  u,  v £ V and  a is  a nonzero  scalar  from  C.  If 
au  = av,  then  u = v. 

T23^  Suppose  V is  a vector  space,  u ^ 0 is  a vector  in  V and  a,  j3  £ C.  If  au  = /3u, 
then  a = f3. 

T3(F  Suppose  that  V is  a vector  space  and  a £ C is  a scalar  such  that  ox  = x 
for  every  x £ V.  Prove  that  a = 1.  In  other  words,  Property  O is  not  duplicated  for 
any  other  scalar  but  the  “special”  scalar,  1.  (This  question  was  suggested  by  James 
Gallagher.) 


Section  S 
Subspaces 

A subspace  is  a vector  space  that  is  contained  within  another  vector  space.  So  every 
subspace  is  a vector  space  in  its  own  right,  but  it  is  also  defined  relative  to  some 
other  (larger)  vector  space.  We  will  discover  shortly  that  we  are  already  familiar 
with  a wide  variety  of  subspaces  from  previous  sections. 


Subsection  S 
Subspaces 


Here  is  the  principal  definition  for  this  section. 

Definition  S Subspace 

Suppose  that  V and  W are  two  vector  spaces  that  have  identical  definitions  of  vector 
addition  and  scalar  multiplication,  and  that  W is  a subset  of  V,  W C V.  Then  W is 
a subspace  of  V.  □ 


Let  us  look  at  an  example  of  a vector  space  inside  another  vector  space. 
Example  SC3  A subspace  of  C3 

We  know  that  C3  is  a vector  space  (Example  VSCV).  Consider  the  subset, 


Xi" 

X2 

2xi  — 5x2  + 7x3  = 0 , 

V'3_ 

J 

It  is  clear  that  W C C3,  since  the  objects  in  W are  column  vectors  of  size  3.  But 
is  lb  a vector  space?  Does  it  satisfy  the  ten  properties  of  Definition  VS  when  we  use 
the  same  operations?  That  is  the  main  question. 


are  vectors  from  W.  Then  we  know  that  these 


'y  i 

Suppose  x = 

X2 

and  y = 

2/2 

x3_ 

.2/3. 

vectors  cannot  be  totally  arbitrary,  they  must  have  gained  membership  in  W by 
virtue  of  meeting  the  membership  test.  For  example,  we  know  that  x must  satisfy 
2xi  — 5x2  + 7^3  = 0 while  y must  satisfy  2y3  — 5y2  + 7y3  = 0.  Our  first  property 
(Property  AC)  asks  the  question,  is  x + y G W?  When  our  set  of  vectors  was  C3, 
this  was  an  easy  question  to  answer.  Now  it  is  not  so  obvious.  Notice  first  that 


Xl 

2/i 

~xi  + y{ 

x + y = 

X2 

+ 

2/2 

= 

x2  + 2/2 

X3. 

.2/3. 

x3  + 2/3. 

and  we  can  test  this  vector  for  membership  in  W as  follows.  Because  x G W we  know 
2xi  — 5x2  + 7x3  = 0 and  because  y G W we  know  2yi  — 5y2  + 7y3  = 0.  Therefore, 

2(xi  + yi ) - 5(x2  + y2)  + 7(x3  + y3 ) = 2xi  + 2y1  - 5x2  - 5 y2  + 7x3  + 7 y3 


272 


§S 


Beezer:  A First  Course  in  Linear  Algebra 


273 


= (2#i  - 5x2  + 7x3)  + (2j/i  - 5y2  + 7 y3) 

= 0 + 0 
= 0 

and  by  this  computation  we  see  that  x + y £ W.  One  property  down,  nine  to  go. 

If  a is  a scalar  and  x £ W,  is  it  always  true  that  ax  £ W1  This  is  what  we  need 
to  establish  Property  SC.  Again,  the  answer  is  not  as  obvious  as  it  was  when  our 
set  of  vectors  was  all  of  C3.  Let  us  see.  First,  note  that  because  x £ W we  know 
2xi  — 5x2  + 7x3  = 0.  Therefore, 


'Xl 

~axi 

ax  = a 

X2 

= 

ax2 

+3. 

_ax3_ 

and  we  can  test  this  vector  for  membership  in  IF.  First,  note  that  because  x £ W 
we  know  2xi  — 5x2  + 7x3  = 0.  Therefore, 

2(axi)  — 5(aX2)  + 7(ax3)  = a(2xi  — 5x2  + 7x3) 

= QfO 
= 0 


and  we  see  that  indeed  ax  £ IF.  Always. 

If  W has  a zero  vector,  it  will  be  unique  (Theorem  ZVU).  The  zero  vector  for  C3 
should  also  perform  the  required  duties  when  added  to  elements  of  W.  So  the  likely 
candidate  for  a zero  vector  in  W is  the  same  zero  vector  that  we  know  C3  has.  You 


can  check  that  0 = 


'O' 

0 

0 


is  a zero  vector  in  IF  too  (Property  Z). 


With  a zero  vector,  we  can  now  ask  about  additive  inverses  (Property  AI).  As 
you  might  suspect,  the  natural  candidate  for  an  additive  inverse  in  W is  the  same  as 
the  additive  inverse  from  C3.  However,  we  must  insure  that  these  additive  inverses 
actually  are  elements  of  IF.  Given  x £ IF,  is  — x £ W ? 


— Xl 


-x2 


and  we  can  test  this  vector  for  membership  in  IF.  As  before,  because  x £ IF  we 
know  2xi  — 5x2  + 7x3  = 0. 


2(— xi)  - 5(-x2)  + 7(— x3)  = — (2xr  - 5x2  + 7x3) 

= -0 
= 0 


and  we  now  believe  that  — x £ IF. 

Is  the  vector  addition  in  W commutative  (Property  C)?  Is  x + y = y + x?  Of 
course!  Nothing  about  restricting  the  scope  of  our  set  of  vectors  will  prevent  the 
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operation  from  still  being  commutative.  Indeed,  the  remaining  five  properties  are 
unaffected  by  the  transition  to  a smaller  set  of  vectors,  and  so  remain  true.  That 
was  convenient. 

So  W satisfies  all  ten  properties,  is  therefore  a vector  space,  and  thus  earns  the 
title  of  being  a subspace  of  C3.  A 

Subsection  TS 
Testing  Subspaces 

In  Example  SC3  we  proceeded  through  all  ten  of  the  vector  space  properties  before 
believing  that  a subset  was  a subspace.  But  six  of  the  properties  were  easy  to  prove, 
and  we  can  lean  on  some  of  the  properties  of  the  vector  space  (the  superset)  to  make 
the  other  four  easier.  Here  is  a theorem  that  will  make  it  easier  to  test  if  a subset  is 
a vector  space.  A shortcut  if  there  ever  was  one. 

Theorem  TSS  Testing  Subsets  for  Subspaces 

Suppose  that  V is  a vector  space  and  W is  a subset  ofV,WC  V.  Endow  W with 
the  same  operations  as  V.  Then  W is  a subspace  if  and  only  if  three  conditions  are 
met 

1.  W is  nonempty,  W ^ 0. 

2.  If  x £ W and  y G IF,  then  x + y G W. 

3.  If  a G C and  x G W,  then  ax  G IF. 

Proof.  (=>)  We  have  the  hypothesis  that  IF  is  a subspace,  so  by  Property  Z we  know 
that  IF  contains  a zero  vector.  This  is  enough  to  show  that  IF  7^  0.  Also,  since  W is 
a vector  space  it  satisfies  the  additive  and  scalar  multiplication  closure  properties 
(Property  AC,  Property  SC),  and  so  exactly  meets  the  second  and  third  conditions. 
If  that  was  easy,  the  other  direction  might  require  a bit  more  work. 

(<=)  We  have  three  properties  for  our  hypothesis,  and  from  this  we  should 
conclude  that  IF  has  the  ten  defining  properties  of  a vector  space.  The  second  and 
third  conditions  of  our  hypothesis  are  exactly  Property  AC  and  Property  SC.  Our 
hypothesis  that  V is  a vector  space  implies  that  Property  C,  Property  AA,  Property 
SMA,  Property  DVA,  Property  DSA  and  Property  O all  hold.  They  continue  to  be 
true  for  vectors  from  IF  since  passing  to  a subset,  and  keeping  the  operation  the 
same,  leaves  their  statements  unchanged.  Eight  down,  two  to  go. 

Suppose  x G IF.  Then  by  the  third  part  of  our  hypothesis  (scalar  closure),  we 
know  that  (— l)x  G IF.  By  Theorem  AISM  (— l)x  = x,  so  together  these  statements 
show  us  that  — x G IF.  — x is  the  additive  inverse  of  x in  V , but  will  continue  in 
this  role  when  viewed  as  element  of  the  subset  IF.  So  every  element  of  IF  has  an 
additive  inverse  that  is  an  element  of  IF  and  Property  AI  is  completed.  Just  one 
property  left. 
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While  we  have  implicitly  discussed  the  zero  vector  in  the  previous  paragraph,  we 
need  to  be  certain  that  the  zero  vector  (of  V)  really  lives  in  W.  Since  W is  nonempty, 
we  can  choose  some  vector  z £ W.  Then  by  the  argument  in  the  previous  paragraph, 
we  know  — z £ W.  Now  by  Property  AI  for  V and  then  by  the  second  part  of  our 
hypothesis  (additive  closure)  we  see  that 

0 = z 4-  (— z)  £ W 

So  W contains  the  zero  vector  from  V.  Since  this  vector  performs  the  required 
duties  of  a zero  vector  in  V,  it  will  continue  in  that  role  as  an  element  of  W.  This 
gives  us,  Property  Z,  the  final  property  of  the  ten  required.  (Sarah  Fellez  contributed 
to  this  proof.)  ■ 

So  just  three  conditions,  plus  being  a subset  of  a known  vector  space,  gets  us  all 
ten  properties.  Fabulous!  This  theorem  can  be  paraphrased  by  saying  that  a subspace 
is  “a  nonempty  subset  (of  a vector  space)  that  is  closed  under  vector  addition  and 
scalar  multiplication.” 

You  might  want  to  go  back  and  rework  Example  SC3  in  light  of  this  result, 
perhaps  seeing  where  we  can  now  economize  or  where  the  work  done  in  the  example 
mirrored  the  proof  and  where  it  did  not.  We  will  press  on  and  apply  this  theorem  in 
a slightly  more  abstract  setting. 

Example  SP4  A subspace  of  P4 

P4  is  the  vector  space  of  polynomials  with  degree  at  most  4 (Example  VSP).  Define 
a subset  W as 

W = {p(x)\p  £ P4,  p{ 2)  = 0} 

so  W is  the  collection  of  those  polynomials  (with  degree  4 or  less)  whose  graphs 
cross  the  i-axis  at  x = 2.  Whenever  we  encounter  a new  set  it  is  a good  idea  to  gain 
a better  understanding  of  the  set  by  finding  a few  elements  in  the  set,  and  a few 
outside  it.  For  example  x2  — x — 2 £ IF,  while  x4  + x3  — 7 g W. 

Is  W nonempty?  Yes,  x — 2 £ IF. 

Additive  closure?  Suppose  p £ IF  and  q £ W.  Is  p + q £ IF?  p and  q are  not 
totally  arbitrary,  we  know  that  p( 2)  = 0 and  q( 2)  = 0.  Then  we  can  check  p + q for 
membership  in  IF, 

(p  + g)(2)  = p( 2)  + q( 2)  Addition  in  P4 

= 0 + 0 p £ IF,  q £ IF 

= 0 

so  we  see  that  p + q qualifies  for  membership  in  IF. 

Scalar  multiplication  closure?  Suppose  that  a £ C and  p £ IF.  Then  we  know 
that  p( 2)  = 0.  Testing  ap  for  membership, 

(<*P)(  2)  = ap{  2) 


Scalar  multiplication  in  P4 
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= aO  p € W 

= 0 


so  ap  £ W. 

We  have  shown  that  W meets  the  three  conditions  of  Theorem  TSS  and  so 
qualifies  as  a subspace  of  P4.  Notice  that  by  Definition  S we  now  know  that  W is 
also  a vector  space.  So  all  the  properties  of  a vector  space  (Definition  VS)  and  the 
theorems  of  Section  VS  apply  in  full.  A 


Much  of  the  power  of  Theorem  TSS  is  that  we  can  easily  establish  new  vector 
spaces  if  we  can  locate  them  as  subsets  of  other  vector  spaces,  such  as  the  vector 
spaces  presented  in  Subsection  VS.EVS. 

It  can  be  as  instructive  to  consider  some  subsets  that  are  not  subspaces.  Since 
Theorem  TSS  is  an  equivalence  (see  Proof  Technique  E)  we  can  be  assured  that  a 
subset  is  not  a subspace  if  it  violates  one  of  the  three  conditions,  and  in  any  example 
of  interest  this  will  not  be  the  “nonempty”  condition.  However,  since  a subspace  has 
to  be  a vector  space  in  its  own  right,  we  can  also  search  for  a violation  of  any  one 
of  the  ten  defining  properties  in  Definition  VS  or  any  inherent  property  of  a vector 
space,  such  as  those  given  by  the  basic  theorems  of  Subsection  VS.VSP.  Notice  also 
that  a violation  need  only  be  for  a specific  vector  or  pair  of  vectors. 

Example  NSC2Z  A non-subspace  in  C2,  zero  vector 

Consider  the  subset  W below  as  a candidate  for  being  a subspace  of  C2 


W = 


Xi 

x-i 


3aq  — hx2  = 12 


The  zero  vector  of  C2,  0 = 


0 

0 


will  need  to  be  the  zero  vector  in  W also.  However, 


OgW  since  3(0)  — 5(0)  = 0 ^ 12.  So  W has  no  zero  vector  and  fails  Property  Z 
of  Definition  VS.  This  subspace  also  fails  to  be  closed  under  addition  and  scalar 
multiplication.  Can  you  find  examples  of  this?  A 


Example  NSC2A  A non-subspace  in  C2,  additive  closure 
Consider  the  subset  X below  as  a candidate  for  being  a subspace  of  C2 


X = 


Xi 

X2 


X1X2  = 0 


You  can  check  that  0 £ A,  so  the  approach  of  the  last  example  will  not  get  us 


anywhere.  However,  notice  that  x = 


1 

0 


£ X and  y = 


0 

1 


£ X.  Yet 


x + y 


0 

1 

So  X fails  the  additive  closure  requirement  of  either  Property  AC  or  Theorem 
TSS,  and  is  therefore  not  a subspace.  A 
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Example  NSC2S  A non-subspace  in  C2,  scalar  multiplication  closure 
Consider  the  subset  Y below  as  a candidate  for  being  a subspace  of  C2 


Y = 


Xl 

Xl 


x\  £ Z,  X2  £ 


Z is  the  set  of  integers,  so  we  are  only  allowing  “whole  numbers”  as  the  constituents 
of  our  vectors.  Now,  0 £ Y,  and  additive  closure  also  holds  (can  you  prove  these 
claims?).  So  we  will  have  to  try  something  different.  Note  that  a = ^ € C and 


2 

3 


€ Y . but 


1 

'2' 

Y 

ox  = - 

2 

3 

3 

.2. 

So  Y fails  the  scalar  multiplication  closure  requirement  of  either  Property  SC  or 
Theorem  TSS,  and  is  therefore  not  a subspace.  A 


There  are  two  examples  of  subspaces  that  are  trivial.  Suppose  that  V is  any 
vector  space.  Then  V is  a subset  of  itself  and  is  a vector  space.  By  Definition  S,  V 
qualifies  as  a subspace  of  itself.  The  set  containing  just  the  zero  vector  Z = {0}  is 
also  a subspace  as  can  be  seen  by  applying  Theorem  TSS  or  by  simple  modifications 
of  the  techniques  hinted  at  in  Example  VSS.  Since  these  subspaces  are  so  obvious 
(and  therefore  not  too  interesting)  we  will  refer  to  them  as  being  trivial. 

Definition  TS  Trivial  Subspaces 

Given  the  vector  space  V,  the  subspaces  V and  {0}  are  each  called  a trivial 

subspace.  □ 


We  can  also  use  Theorem  TSS  to  prove  more  general  statements  about  subspaces, 
as  illustrated  in  the  next  theorem. 


Theorem  NSMS  Null  Space  of  a Matrix  is  a Subspace 

Suppose  that  A is  an  m x n matrix.  Then  the  null  space  of  A,  Af(A),  is  a subspace 
of  Cn. 


Proof.  We  will  examine  the  three  requirements  of  Theorem  TSS.  Recall  that  Defini- 
tion NSM  can  be  formulated  as  A 7(A)  = {x£  C"|  Ax  = 0}. 

First,  0 £ A7(A),  which  can  be  inferred  as  a consequence  of  Theorem  HSC.  So 
A/"(A)  ^ 0. 

Second,  check  additive  closure  by  supposing  that  x € A f(A)  and  y £ J\f( A) . So 
we  know  a little  something  about  x and  y:  Ax  = 0 and  Ay  = 0,  and  that  is  all  we 
know.  Question:  Is  x + y G A?(A)?  Let  us  check. 

A(x  + y)  = Ax  + Ay  Theorem  MMDAA 

= 0 + 0 x G A 7(A) , y G A7(A) 

= 0 Theorem  VSPCV 
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So,  yes,  x + y qualifies  for  membership  in  Af(A). 

Third,  check  scalar  multiplication  closure  by  supposing  that  a G C and  x G A/"(A). 
So  we  know  a little  something  about  x:  Ax  = 0,  and  that  is  all  we  know.  Question: 
Is  ax  G AT  (A)?  Let  us  check. 

A(ax)  = a(Ax)  Theorem  MMSMM 

= aO  x G A f(A) 

= 0 Theorem  ZVSM 

So,  yes,  ax  qualifies  for  membership  in  A/"(A). 

Having  met  the  three  conditions  in  Theorem  TSS  we  can  now  say  that  the  null 
space  of  a matrix  is  a subspace  (and  hence  a vector  space  in  its  own  right!).  ■ 


Here  is  an  example  where  we  can  exercise  Theorem  NSMS. 

Example  RSNS  Recasting  a subspace  as  a null  space 
Consider  the  subset  of  C5  defined  as 


r 

~x{ 

) 

3xi  + X2  — 5x3  + 7x4  + x§  = 0,  j 

i 

X3 

4xi  + 6x2  + 3x3  — 6x4  — 5x5  = 0, 

X4 

— 2xi  + 4x2  + 7x4  + X5  = 0 

l 

_X5 

> 

It  is  possible  to  show  that  IF  is  a subspace  of  C5  by  checking  the  three  conditions 
of  Theorem  TSS  directly,  but  it  will  get  tedious  rather  quickly.  Instead,  give  W 
a fresh  look  and  notice  that  it  is  a set  of  solutions  to  a homogeneous  system  of 
equations.  Define  the  matrix 


A = 


- 3 

4 

-2 


1 

6 

4 


and  then  recognize  that  W = A f(A) . By 
that  IF  is  a subspace.  Boom! 


-5  7 1 - 

3 -6  -5 

0 7 1 . 

Theorem  NSMS 


we  can  immediately  see 
A 


Subsection  TSS 
The  Span  of  a Set 

The  span  of  a set  of  column  vectors  got  a heavy  workout  in  Chapter  V and  Chapter 
M.  The  definition  of  the  span  depended  only  on  being  able  to  formulate  linear 
combinations.  In  any  of  our  more  general  vector  spaces  we  always  have  a definition 
of  vector  addition  and  of  scalar  multiplication.  So  we  can  build  linear  combinations 
and  manufacture  spans.  This  subsection  contains  two  definitions  that  are  just  mild 
variants  of  definitions  we  have  seen  earlier  for  column  vectors.  If  you  have  not  already, 
compare  them  with  Definition  LCCV  and  Definition  SSCY. 
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Definition  LC  Linear  Combination 

Suppose  that  V is  a vector  space.  Given  n vectors  Ui,  U2,  U3,  . . . , u„  and  n scalars 
cti,  «2j  03,  . . . , an,  their  linear  combination  is  the  vector 


aiU!  + ci2u2  + 03U3  + • • • + cxn  u„ 


□ 


Example  LCM  A linear  combination  of  matrices 

In  the  vector  space  M23  of  2 x 3 matrices,  we  have  the  vectors 


'l  3 -2' 

3 -1  2 

4 2 -4' 

X = 

2 0 7 

y = 

5 5 1 

z = 

1 1 1 

and  we  can  form  linear  combinations  such  as 


2x  + 4y  + (-l)z 


2 

4 

10 

23 


3 -2 
0 7 

6 -4 
0 14 


+ 4 


3 -1 
5 5 


+ (-l) 


12  -4  8 

20  20  4 


-4  —2 
- 1 - 1 


0 8 
19  17 


2 -4 

1 1 

4 ' 

-1 


or, 


4x  — 2y  + 3z 


'l  3 -2' 

'3 

-1  2 

1 . 

'4 

2 

-4' 

4 

2 0 7 

2 

5 

5 1 

+ 3 

1 

1 

1 

4 

; 12  -8' 

‘ 

-6 

2 

-4' 

'12 

6 

-12' 

8 

; 0 28 

+ 

- 

-10 

-10 

-2 

F 

3 

3 

3 

10  20  -24 

1 -7  29 


A 


When  we  realize  that  we  can  form  linear  combinations  in  any  vector  space,  then 
it  is  natural  to  revisit  our  definition  of  the  span  of  a set,  since  it  is  the  set  of  all 
possible  linear  combinations  of  a set  of  vectors. 

Definition  SS  Span  of  a Set 

Suppose  that  V is  a vector  space.  Given  a set  of  vectors  S = {ui,  u2,  u3,  ....  ut}, 
their  span,  ( S ),  is  the  set  of  all  possible  linear  combinations  of  ui,  U2,  U3,  ....  ut. 
Symbolically, 

(S)  = { OiU!  + a2U2  + <T3u3  + • • • + I Oj  £ C,  1 < i < t} 

t 

i— 1 


□ 
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Theorem  SSS  Span  of  a Set  is  a Subspace 

Suppose  V is  a vector  space.  Given  a set  of  vectors  S = {ui,  U2,  U3,  . . . , ut}  C V, 
their  span , {S),  is  a subspace. 

Proof.  By  Definition  SS,  the  span  contains  linear  combinations  of  vectors  from  the 
vector  space  V,  so  by  repeated  use  of  the  closure  properties,  Property  AC  and 
Property  SC,  (S)  can  be  seen  to  be  a subset  of  V. 

We  will  then  verify  the  three  conditions  of  Theorem  TSS.  First, 

0 = 0 + 0 + 0 + ...  + 0 Property  Z 

= Oui  + 0u2  + 0u3  + • • • + 0ut  Theorem  ZSSM 

So  we  have  written  0 as  a linear  combination  of  the  vectors  in  S and  by  Definition 
SS,0  G (S)  and  therefore  (S)  7^  0. 

Second,  suppose  x G (S)  and  y € (S).  Can  we  conclude  that  x + y G (£)?  What 
do  we  know  about  x and  y by  virtue  of  their  membership  in  (S') ? There  must  be 
scalars  from  C,  a\1  a2,  a3,  • ■ ■ , oit  and  /3\,  /32,  /3 3,  . . . , fit  so  that 

x = tt]U-|  + a2u2  + a3u3  + • • • + atut 

y = /3iui  + /32u2  + /33u3  H b /3tut 

Then 

x + y = aiui  + a2u2  + a3u3  H b atut 

+ /3iui  + /32u2  + /33u3  H b /3tut 

= aiui  + /?  1U1  + a2u2  + /32u2 

+ a3u3  + /33u3  H h atut  + /3tut  Property  AA,  Property  C 

= (ai  + /?i)ui  + (a2  + /32)  u2 

+ (a3  + /33)u3  H h (at  + /3t)ut  Property  DSA 

Since  each  ai  + f3i  is  again  a scalar  from  C we  have  expressed  the  vector  sum  x + y 
as  a linear  combination  of  the  vectors  from  S , and  therefore  by  Definition  SS  we  can 
say  that  x + y G (S) . 

Third,  suppose  a G C and  x G ( S ).  Can  we  conclude  that  ax  G (S’)?  What  do 
we  know  about  x by  virtue  of  its  membership  in  (S)7  There  must  be  scalars  from  C, 
a3,  a2,  a3,  . . . , at  so  that 

x = aiu3  + a2u2  + a3u3  H b atut 


Then 

ax  = a (aiui  + a2u2  + a3u3  + • • • + atut) 

= a(a3U!)  + a(a2u2)  + a(a3u3)  + • • • + a(atut)  Property  DVA 

= (aai)ui  + (aa2)u2  + (aa3)u3  + • • • + (act()u(  Property  SMA 
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Since  each  acti  is  again  a scalar  from  C we  have  expressed  the  scalar  multiple  ax  as 
a linear  combination  of  the  vectors  from  S,  and  therefore  by  Definition  SS  we  can 
say  that  ax  € (S). 

With  the  three  conditions  of  Theorem  TSS  met,  we  can  say  that  (S)  is  a subspace 
(and  so  is  also  a vector  space,  Definition  VS).  (See  Exercise  SS.T20,  Exercise  SS.T21, 
Exercise  SS.T22.)  ■ 

Example  SSP  Span  of  a set  of  polynomials 
In  Example  SP4  we  proved  that 

W = {p(x)\p  G P4,  p( 2)  = 0} 

is  a subspace  of  P4,  the  vector  space  of  polynomials  of  degree  at  most  4.  Since  W is 
a vector  space  itself,  let  us  construct  a span  within  W.  First  let 

S = {a;4  — 4a;3  + 5a;2  — x — 2,  2a;4  — 3a;3  — 6a;2  + 6x  + 4} 

and  verify  that  S'  is  a subset  of  W by  checking  that  each  of  these  two  polynomials 
has  x = 2 as  a root.  Now,  if  we  define  U = ( S ),  then  Theorem  SSS  tells  us  that  U is 
a subspace  of  W . So  quite  quickly  we  have  built  a chain  of  subspaces,  U inside  W, 
and  W inside  P4. 

Rather  than  dwell  on  how  quickly  we  can  build  subspaces,  let  us  try  to  gain  a 
better  understanding  of  just  how  the  span  construction  creates  subspaces,  in  the 
context  of  this  example.  We  can  quickly  build  representative  elements  of  U, 

3 (a;4  — 4a;3  + 5a;2  — a;  — 2)  + 5 (2a;4  — 3a;3  — 6a;2  + 6a; + 4)  = 13a;4  - 27a;3- 15a;2  + 27a; + 14 

and 

(— 2)(a;4— 4a;3+5a;2— a;— 2)+8(2a;4— 3a;3  — 6a;2+6a:+4)  = 14a;4-16a;3-58a;2+50a;+36 

and  each  of  these  polynomials  must  be  in  W since  it  is  closed  under  addition  and 
scalar  multiplication.  But  you  might  check  for  yourself  that  both  of  these  polynomials 
have  x = 2 as  a root. 

I can  tell  you  that  y = 3a;4  — 7a;3  — a;2  + lx  — 2 is  not  in  U,  but  would  you  believe 
me?  A first  check  shows  that  y does  have  x = 2 as  a root,  but  that  only  shows  that 
y £ W.  What  does  y have  to  do  to  gain  membership  in  U = (5)?  It  must  be  a linear 
combination  of  the  vectors  in  S , a;4  — 4a;3  + 5a;2  — x — 2 and  2a;4  — 3a;3  — 6a;2  + 6a;  + 4. 
So  let  us  suppose  that  y is  such  a linear  combination, 

y = 3a;4  — 7a;3  — a;2  + 7a;  — 2 

= ai(a;4  — 4a;3  + 5a;2  — x — 2)  + a2(2x4  — 3x3  — 6x2  + 6x  + 4) 

= (a4  + 2a2)x4  + (— 4ai  — 3a2)x3  + (5a4  — 6a2)x2 
+ (— a4  + 6a2)x  + ( — 2ai  + 4a2) 

Notice  that  operations  above  are  done  in  accordance  with  the  definition  of  the 
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vector  space  of  polynomials  (Example  VSP).  Now,  if  we  equate  coefficients,  which  is 
the  definition  of  equality  for  polynomials,  then  we  obtain  the  system  of  five  linear 
equations  in  two  variables 


oq  + 2«2  = 3 
— 4oq  — 3a2  = —7 
5oq  — 6a2  = — 1 
—ot\  + 6ci2  = 7 
— 2ol\  + 4a2  = —2 


Build  an  augmented  matrix  from  the  system  and  row-reduce, 


‘ 1 2 3 " 

-4  -3  -7 
5 -6  -1 

-16  7 

-2  4 -2 


RREF 
> 


m 

0 

0 

0 

0 


0 

0 

0 

0 

0 


0 

0 

0 

0 

0 


Since  the  final  column  of  the  row-reduced  augmented  matrix  is  a pivot  column, 
Theorem  RCLS  tells  us  the  system  of  equations  is  inconsistent.  Therefore,  there  are 
no  scalars,  aq  and  a2;  to  establish  y as  a linear  combination  of  the  elements  in  U. 
So  y &U.  A 


Let  us  again  examine  membership  in  a span. 


Example  SM32  A subspace  of  M32 

The  set  of  all  3 x 2 matrices  forms  a vector  space  when  we  use  the  operations  of 
matrix  addition  (Definition  MA)  and  scalar  matrix  multiplication  (Definition  MSM), 
as  was  shown  in  Example  VSM.  Consider  the  subset 


r 

■3  1 ■ 

■ 1 1 - 

■3  -r 

'4  2 ' 

■ 3 r 

s=\ 

4 2 

5 

2 -1 

? 

-1  2 

5 

1 -2 

5 

-4  0 

l 

5 -5 

14  -1 

-19  -11 

14  -2 

-17  7 

and  define  a new  subset  of  vectors  W in  M32  using  the  span  (Definition  SS),  W = ( S ). 
So  by  Theorem  SSS  we  know  that  W is  a subspace  of  M32.  While  W is  an  infinite  set, 
and  this  is  a precise  description,  it  would  still  be  worthwhile  to  investigate  whether 
or  not  W contains  certain  elements. 

First,  is 


y = 


'9 

7 

10 


3 ' 
3 

-11 


in  W?  To  answer  this,  we  want  to  determine  if  y can  be  written  as  a linear  combination 


§S 


Beezer:  A First  Course  in  Linear  Algebra 


283 


of  the  five  matrices  in  S.  Can  we  find  scalars,  ai,  ct2,  03,  0:4,  05  so  that 

'9  3 

7 3 

10  -11 


ai 

'3 

4 

1 ' 

2 

A a2 

' 1 
2 

1 ' 

-1 

A a3 

' 3 

-1 

-r 

2 

A O-A 

-4 

1 

2 ' 
-2 

A 0:5 

' 3 

-4 

r 

0 

5 

-5. 

14 

-1 

-19 

-11 

14 

-2 

-17 

7 

3a3  A ol 2 A 3a3  A 4a4  A 3a3  ot\  A 02  — a3  A 2a4  A 05 

= 4a  1 A 2a2  — a3  A ol\  — 4 as  2a3  — a2  A 2a3  — 2a4 

_5ai  A 14a2  — 19a3  A 14a4  — 17a,5  — 5ai  — a2  — lla3  — 2a4  A 7as_ 

Using  our  definition  of  matrix  equality  (Definition  ME)  we  can  translate  this 
statement  into  six  equations  in  the  five  unknowns, 

3ai  A 0.2  A 3a3  A 4a4  A 3as  = 9 
ai  A a2  — a3  A 2a4  A a 5 = 3 
4ai  A 2a2  — a3  A 04  — 4as  = 7 
2a3  -a2  A 2a3  — 204  = 3 
5ai  A 14a2  — 19a3  A 14a4  — 17as  = 10 
— 5ai  — a2  — lla3  — 2a4  A 7as  = —11 

This  is  a linear  system  of  equations,  which  we  can  represent  with  an  augmented 
matrix  and  row-reduce  in  search  of  solutions.  The  matrix  that  is  row-equivalent  to 
the  augmented  matrix  is 

-0  0 0 0 | 2 - 

0 0 0 0 0-1 
0 0 0 0 ^0 

0 0 0 0 f 1 

0 0 0 0 0 0 

.0  0 0 0 0 0 . 

So  we  recognize  that  the  system  is  consistent  since  the  final  column  is  not  a pivot 
column  (Theorem  RCLS),  and  compute  n — r = 5 — 4=1  free  variables  (Theorem 
FVCS).  While  there  are  infinitely  many  solutions,  we  are  only  in  pursuit  of  a single 
solution,  so  let  us  choose  the  free  variable  a.5  = 0 for  simplicity’s  sake.  Then  we 
easily  see  that  ai  = 2,  a2  = —1,  a3  = 0,  = 1.  So  the  scalars  ai  = 2,  a2  = — 1, 

a3  = 0,  04  = 1,  as  = 0 will  provide  a linear  combination  of  the  elements  of  S that 
equals  y,  as  we  can  verify  by  checking, 


■9  3 - 

■3  1 ■ 

■1  1 - 

'4  2 ' 

7 3 

10  -11 

= 2 

4 2 

5 -5 

+ (-i) 

2 -1 

14  -1 

+ (1) 

1 -2 
14  -2 

So  with  one  particular  linear  combination  in  hand,  we  are  convinced  that  y deserves 
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to  be  a member  of  W = (S). 
Second,  is 


1 ' 
1 

-2 


in  W ? To  answer  this,  we  want  to  determine  if  x can  be  written  as  a linear  combination 
of  the  five  matrices  in  S.  Can  we  find  scalars,  a±,  a 2,  03,  <24,  05  so  that 

'2  1 ' 

3 1 

4 -2 


ai 

■3 

4 

1 ■ 

2 

A a2 

‘I 

2 

1 ‘ 
-1 

A a3 

‘ 3 
-1 

-r 

2 

+ O'  4 

■4 

1 

2 ' 
-2 

A a5 

- 3 

-4 

r 

0 

5 

-5 

14 

-1 

-19 

-11 

14 

-2 

-17 

7 

3ai  A a2  A 3a3  + 4a4  + 3a5  ai  + a2  - a3  + 2a4  + a5 

= 4a  1 -f-  2a2  — a3  A 04  — 4a3  2 a3  — a2  A 2a3  — 2a4 

_5ai  + 14a2  — 19a3  + 14a4  — 17a.5  — 5ai  — a2  — lla3  — 2a4  A 7as_ 

Using  our  definition  of  matrix  equality  (Definition  ME)  we  can  translate  this  state- 
ment into  six  equations  in  the  five  unknowns, 

3ai  A a2  A 3a3  A 4a4  A 3a5  = 2 
Qi  A a2  - a3  A 2a4  A as  = 1 
4aq  A 2a2  — a3  A 0:4  — 4as  = 3 
2ai  — a2  A 2a3  — 2a4  = 1 
5ai  A 14a2  — 19a3  A 14a4  — 17as  = 4 
— 5ai  — a2  — lla3  — 2a4  A 7as  = —2 


This  is  a linear  system  of  equations,  which  we  can  represent  with  an  augmented 
matrix  and  row-reduce  in  search  of  solutions.  The  matrix  that  is  row-equivalent  to 
the  augmented  matrix  is 


"[T]  000 
0 0 0 0 

0 0 0 0 

0 0 0 0 

0 0 0 0 

0 0 0 0 


5 

8 

38 
' 8 
_ 7 
8 

17 

8 


0 

0 


0 

0 

0 

0 

0 

0 


Since  the  last  column  is  a pivot  column,  Theorem  RCLS  tells  us  that  the  system  is 
inconsistent.  Therefore,  there  are  no  values  for  the  scalars  that  will  place  x in  W, 
and  so  we  conclude  that  x ^ W.  A 


Notice  how  Example  SSP  and  Example  SM32  contained  questions  about  mem- 


§S 


Beezer:  A First  Course  in  Linear  Algebra 


285 


bership  in  a span,  but  these  questions  quickly  became  questions  about  solutions  to 
a system  of  linear  equations.  This  will  be  a common  theme  going  forward. 

Subsection  SC 
Subspace  Constructions 

Several  of  the  subsets  of  vectors  spaces  that  we  worked  with  in  Chapter  M are  also 
subspaces  — they  are  closed  under  vector  addition  and  scalar  multiplication  in  Cm. 

Theorem  CSMS  Column  Space  of  a Matrix  is  a Subspace 
Suppose  that  A is  an  m x n matrix.  Then  C(A)  is  a subspace  of  Cm. 

Proof.  Definition  CSM  shows  us  that  C(A)  is  a subset  of  Cm,  and  that  it  is  defined 
as  the  span  of  a set  of  vectors  from  Cm  (the  columns  of  the  matrix).  Since  C(A)  is  a 
span,  Theorem  SSS  says  it  is  a subspace.  ■ 

That  was  easy!  Notice  that  we  could  have  used  this  same  approach  to  prove  that 
the  null  space  is  a subspace,  since  Theorem  SSNS  provided  a description  of  the  null 
space  of  a matrix  as  the  span  of  a set  of  vectors.  However,  I much  prefer  the  current 
proof  of  Theorem  NSMS.  Speaking  of  easy,  here  is  a very  easy  theorem  that  exposes 
another  of  our  constructions  as  creating  subspaces. 

Theorem  RSMS  Row  Space  of  a Matrix  is  a Subspace 
Suppose  that  A is  an  m x n matrix.  Then  7 Z(A)  is  a subspace  of  Cn. 

Proof.  Definition  RSM  says  7 Z(A)  = C(A*),  so  the  row  space  of  a matrix  is  a column 
space,  and  every  column  space  is  a subspace  by  Theorem  CSMS.  That’s  enough.  ■ 

One  more. 

Theorem  LNSMS  Left  Null  Space  of  a Matrix  is  a Subspace 
Suppose  that  A is  an  m x n matrix.  Then  C(A)  is  a subspace  of  Cm. 

Proof.  Definition  LNS  says  C{A)  = M{Al),  so  the  left  null  space  is  a null  space,  and 
every  null  space  is  a subspace  by  Theorem  NSMS.  Done.  ■ 

So  the  span  of  a set  of  vectors,  and  the  null  space,  column  space,  row  space  and 
left  null  space  of  a matrix  are  all  subspaces,  and  hence  are  all  vector  spaces,  meaning 
they  have  all  the  properties  detailed  in  Definition  VS  and  in  the  basic  theorems 
presented  in  Section  VS.  We  have  worked  with  these  objects  as  just  sets  in  Chapter 
V and  Chapter  M,  but  now  we  understand  that  they  have  much  more  structure.  In 
particular,  being  closed  under  vector  addition  and  scalar  multiplication  means  a 
subspace  is  also  closed  under  linear  combinations. 
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Reading  Questions 

1.  Summarize  the  three  conditions  that  allow  us  to  quickly  test  if  a set  is  a subspace. 

2.  Consider  the  set  of  vectors 


W = 


r 

a 

) 

j 

i 

b 

c 

3a-26  + c=  5j 

Is  the  set  W a subspace  of  C3?  Explain  your  answer. 

3.  Name  five  general  constructions  of  sets  of  column  vectors  (subsets  of  Cm)  that  we  now 
know  as  subspaces. 


Exercises 

015^  Working  within  the  vector  space  C3,  determine  if  b 


/( 

'3 

T 

'l' 

~2 

w=(\ 

2 

1 

0 

1 

1 

\\ 

3 

3 

0 

3 

W = 


1 

2 

-1 

1 


W = 


016^  Working  within  the  vector  space  C4,  determine  if  b = 


017^  Working  within  the  vector  space  C4,  determine  if  b = 


is  in  the  subspace  W, 


is  in  the  subspace  W, 


is  in  the  subspace  W, 


C20'  Working  within  the  vector  space  P3  of  polynomials  of  degree  3 or  less,  determine  if 
p{x)  = x3  + 6x  + 4 is  in  the  subspace  W below. 

W = ({a:3  + x2  + x,  x3  + 2x  — 6,  x2  — 5}) 


C2C  Consider  the  subspace 


4 0 
2 3 


-3 

2 


1 

1 
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of  the  vector  space  of  2 x 2 matrices,  M22-  Is  C 


-3 

6 


3 

-4 


an  element  of  W? 


C25 

Show  that  the  set  W = < 

*1 

X2 

3a;i  — 5*2  = 12  j 

AC  and  Property  SC. 

' 

C26 

Show  that  the  set  Y = 

Xl 

X2 

*i  €E  Z,  *2  G Z j 

from  Example  NSC2Z  fails  Property 


from  Example  NSC2S  has  Property 


AC. 

M20f  In  C3,  the  vector  space  of  column  vectors  of  size  3,  prove  that  the  set  Z is  a 
subspace. 


( 

*i 

1 

*2 

*3 

4*i  — *2  + 5*3  = 0 

T2(C  A square  matrix  A of  size  n is  upper  triangular  if  [A]4 . = 0 whenever  i > j.  Let 
UTn  be  the  set  of  all  upper  triangular  matrices  of  size  n.  Prove  that  UTn  is  a subspace  of 
the  vector  space  of  all  square  matrices  of  size  n,  Mnn. 

T3(C  Let  P be  the  set  of  all  polynomials,  of  any  degree.  The  set  P is  a vector  space.  Let 
E be  the  subset  of  P consisting  of  all  polynomials  with  only  terms  of  even  degree.  Prove  or 
disprove:  the  set  A is  a subspace  of  P. 

T3F  Let  P be  the  set  of  all  polynomials,  of  any  degree.  The  set  P is  a vector  space.  Let 
F be  the  subset  of  P consisting  of  all  polynomials  with  only  terms  of  odd  degree.  Prove  or 
disprove:  the  set  F is  a subspace  of  P. 


Section  LISS 

Linear  Independence  and  Spanning  Sets 

A vector  space  is  defined  as  a set  with  two  operations,  meeting  ten  properties 
(Definition  VS).  Just  as  the  definition  of  span  of  a set  of  vectors  only  required 
knowing  how  to  add  vectors  and  how  to  multiply  vectors  by  scalars,  so  it  is  with 
linear  independence.  A definition  of  a linearly  independent  set  of  vectors  in  an 
arbitrary  vector  space  only  requires  knowing  how  to  form  linear  combinations  and 
equating  these  with  the  zero  vector.  Since  every  vector  space  must  have  a zero  vector 
(Property  Z),  we  always  have  a zero  vector  at  our  disposal. 

In  this  section  we  will  also  put  a twist  on  the  notion  of  the  span  of  a set  of 
vectors.  Rather  than  beginning  with  a set  of  vectors  and  creating  a subspace  that  is 
the  span,  we  will  instead  begin  with  a subspace  and  look  for  a set  of  vectors  whose 
span  equals  the  subspace. 

The  combination  of  linear  independence  and  spanning  will  be  very  important 
going  forward. 

Subsection  LI 
Linear  Independence 

Our  previous  definition  of  linear  independence  (Definition  LICV)  employed  a relation 
of  linear  dependence  that  was  a linear  combination  on  one  side  of  an  equality  and  a 
zero  vector  on  the  other  side.  As  a linear  combination  in  a vector  space  (Definition 
LC)  depends  only  on  vector  addition  and  scalar  multiplication,  and  every  vector 
space  must  have  a zero  vector  (Property  Z),  we  can  extend  our  definition  of  linear 
independence  from  the  setting  of  Cm  to  the  setting  of  a general  vector  space  V with 
almost  no  changes.  Compare  these  next  two  definitions  with  Definition  R.LDCV  and 
Definition  LICV. 

Definition  RLD  Relation  of  Linear  Dependence 

Suppose  that  V is  a vector  space.  Given  a set  of  vectors  S = {u1;  u2,  u3,  . . . , u„ } , 
an  equation  of  the  form 

aqui  + a2u2  + a3u3  H 1-  anun  = 0 

is  a relation  of  linear  dependence  on  S.  If  this  equation  is  formed  in  a trivial 
fashion,  i.e.  a,  = 0,  1 < i < n,  then  we  say  it  is  a trivial  relation  of  linear 
dependence  on  S. 

□ 

Definition  LI  Linear  Independence 

Suppose  that  V is  a vector  space.  The  set  of  vectors  S = {ui,  u2,  u3,  . . . , u„}  from 
V is  linearly  dependent  if  there  is  a relation  of  linear  dependence  on  S that  is  not 
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trivial.  In  the  case  where  the  only  relation  of  linear  dependence  on  S is  the  trivial 
one,  then  S'  is  a linearly  independent  set  of  vectors.  □ 

Notice  the  emphasis  on  the  word  “only.”  This  might  remind  you  of  the  definition 
of  a nonsingular  matrix,  where  if  the  matrix  is  employed  as  the  coefficient  matrix  of 
a homogeneous  system  then  the  only  solution  is  the  trivial  one. 

Example  LIP4  Linear  independence  in  P4 

In  the  vector  space  of  polynomials  with  degree  4 or  less,  P4  (Example  VSP)  consider 
the  set  S below 

{2a;4  + 3a;3  + 2a;2  — x + 10,  —a;4  — 2a;3  + a;2  + 5x  — 8,  2a;4  + a;3  + 10a;2  + 17a;  — 2} 
Is  this  set  of  vectors  linearly  independent  or  dependent?  Consider  that 
3 (2a;4  + 3a;3  + 2a;2  — a;  + 10)  + 4 (—a;4  — 2a;3  + x2  + 5a;  — 8) 

+ (-1)  (2a;4  + a;3  + 10a;2  + 17a;  - 2)  = Oa:4  + Oa’3  + Oa2  + Oa  + 0 = 0 

This  is  a nontrivial  relation  of  linear  dependence  (Definition  RLD)  on  the  set  S and 
so  convinces  us  that  S is  linearly  dependent  (Definition  LI). 

Now,  I hear  you  say,  “Where  did  those  scalars  come  from?”  Do  not  worry  about 
that  right  now,  just  be  sure  you  understand  why  the  above  explanation  is  sufficient  to 
prove  that  S is  linearly  dependent.  The  remainder  of  the  example  will  demonstrate 
how  we  might  find  these  scalars  if  they  had  not  been  provided  so  readily. 

Let  us  look  at  another  set  of  vectors  (polynomials)  from  P4.  Let 

T = {3a;4  — 2a:3  + 4a;2  + 6a  — 1,  —3a;4  + la;3  + Oa2  + 4a;  + 2, 

4a;4  + 5a;3  - 2a;2  + 3a;  + 1,  2a;4  - 7a;3  + 4a;2  + 2a;  + l} 

Suppose  we  have  a relation  of  linear  dependence  on  this  set, 

0 = 0a;4  + Oa;3  + Oa;2  + 0a;  + 0 

= ot\  (3a:4  — 2a;3  + 4a;2  + 6a;  — l)  + a2  (—3a;4  + la;3  + Oa:2  + 4a:  + 2) 

+ a3  (4a;4  + 5a;3  — 2a:2  + 3a  + l)  + <24  (2a;4  — 7a;3  + 4a;2  + 2a;  + l) 

Using  our  definitions  of  vector  addition  and  scalar  multiplication  in  P4  (Example 
VSP),  we  arrive  at, 

Oa;4  + Oa;3  + Oa;2  + Oa;  + 0 = 

(3ai  — 3a2  4ct3  -t-  20:4)  x 4-  ( — 2oq  4~  ot2  T 5ci3  — 70:4)  a3T 
(4ai  — 2a3  + 4a4)  x2  + (6«i  + 4a2  + 3a3  + 204)  x + (— oq  + 2a2  + a3  + 0.4) 
Equating  coefficients,  we  arrive  at  the  homogeneous  system  of  equations, 

3aq  — 3ci2  + 4«3  + 2a4  = 0 
— 2ai  + a2  + 5«3  — 7a4  = 0 
4ai  — 2a3  + 4a4  = 0 
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6ai  + 4a2  + 3«3  + 2a4  = 0 

—a\  + 2a2  + CI3  + «4  = 0 

We  form  the  coefficient  matrix  of  this  homogeneous  system  of  equations  and 
row-reduce  to  find 

0 0 0 o' 

0 0 0 0 

0 0 0 0 

0 0 0 0 

.0000. 


We  expected  the  system  to  be  consistent  (Theorem  HSC)  and  so  can  compute 
n — r = 4 — 4 = 0 and  Theorem  CSRN  tells  us  that  the  solution  is  unique.  Since 
this  is  a homogeneous  system,  this  unique  solution  is  the  trivial  solution  (Definition 
TSHSE),  a4  = 0,  a2  = 0,  a3  = 0,  oq  = 0.  So  by  Definition  LI  the  set  T is  linearly 
independent. 

A few  observations.  If  we  had  discovered  infinitely  many  solutions,  then  we  could 
have  used  one  of  the  nontrivial  solutions  to  provide  a linear  combination  in  the 
manner  we  used  to  show  that  S was  linearly  dependent.  It  is  important  to  realize 
that  it  is  not  interesting  that  we  can  create  a relation  of  linear  dependence  with  zero 
scalars  — we  can  always  do  that  but  for  T,  this  is  the  only  way  to  create  a relation 
of  linear  dependence.  It  was  no  accident  that  we  arrived  at  a homogeneous  system 
of  equations  in  this  example,  it  is  related  to  our  use  of  the  zero  vector  in  defining  a 
relation  of  linear  dependence.  It  is  easy  to  present  a convincing  statement  that  a set 
is  linearly  dependent  (just  exhibit  a nontrivial  relation  of  linear  dependence)  but  a 
convincing  statement  of  linear  independence  requires  demonstrating  that  there  is 
no  relation  of  linear  dependence  other  than  the  trivial  one.  Notice  how  we  relied 
on  theorems  from  Chapter  SLE  to  provide  this  demonstration.  Whew!  There  is  a 
lot  going  on  in  this  example.  Spend  some  time  with  it,  we  will  be  waiting  patiently 
right  here  when  you  get  back.  A 


Example  LIM32  Linear  independence  in  M32 

Consider  the  two  sets  of  vectors  R and  S from  the  vector  space  of  all  3 x 2 matrices, 
M32  (Example  VSM) 


R = 
S = 


-6" 

0 

-9. 

r 

1 , 

4 


' -5  3]  1 

-10  7 \ 

2 oj  J 


One  set  is  linearly  independent,  the  other  is  not.  Which  is  which?  Let  us  examine 
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R first.  Build  a generic  relation  of  linear  dependence  (Definition  RLD), 


■3 

-r 

-2 

3 1 

■ 6 

—6' 

‘ 7 

9 ' 

Oi\ 

1 

6 

4 

-6 

+ a2 

1 

-2 

1 1 
(CS  CO 

+ «3 

-1 

7 

0 

—9 

+ OL  4 

-4 

2 

-5 

5 

Massaging  the  left-hand  side  with  our  definitions  of  vector  addition  and  scalar 
multiplication  in  M32  (Example  VSM)  we  obtain, 


'3aq  — 2a2  + 603  + 7aq 

— aq  + 3a2  — 6«3  + 9a4  ' 

'0 

O' 

aq  + aq  — «3  — 4aq 

4aq  — 3a2  H — 5a4 

= 

0 

0 

_6aq  — 2ci2  + 7a3  + 2a4 

— 6aq  — 6a2  — 9a3  + 5a4_ 

0 

0 

Using  our  definition  of  matrix  equality  (Definition  ME)  to  equate  entries  we  get 
the  homogeneous  system  of  six  equations  in  four  variables, 

3oq  — 2a2  + 603  + 7a4  = 0 
—or  + 3a2  — 603  + 9«4  = 0 

c*q  + 0L2  — 0:3  — 4ct4  : 0 
4oq  — 3a2  H — 5 <24  = 0 

6oq  — 2a2  T 7 0:3  -|-  2«4  = 0 

— 604  — 6a2  — 9c*3  + 5a4  = 0 

Form  the  coefficient  matrix  of  this  homogeneous  system  and  row-reduce  to  obtain 


m 

0 

0 

0 ' 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 . 

Analyzing  this  matrix  we  are  led  to  conclude  that  aq  = 0,  a2  = 0,  0:3  = 0,  aq  = 0. 
This  means  there  is  only  a trivial  relation  of  linear  dependence  on  the  vectors  of  R 
and  so  we  call  R a linearly  independent  set  (Definition  LI). 

So  it  must  be  that  S is  linearly  dependent.  Let  us  see  if  we  can  find  a nontrivial 
relation  of  linear  dependence  on  S.  We  will  begin  as  with  A.,  by  constructing  a 
relation  of  linear  dependence  (Definition  RLD)  with  unknown  scalars, 


'2 

0 ■ 

'-4 

0 ' 

' 1 

r 

■-5 

3- 

Oil 

1 

1 

-1 

3 

+ c*2 

—2 
- 2 

2 

—6 

+ «3 

-2 

2 

1 

4 

+ OL4 

-10 

2 

7 

0 

Massaging  the  left-hand  side  with  our  definitions  of  vector  addition  and  scalar 
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multiplication  in  M32  (Example  VSM)  we  obtain, 


2ai  — 4a2  + a 3 — 5a4 

ol  3 + 3g?4 

'0 

O' 

ot\  — 2q?2  — 2q?3  — lOa^ 

— + 2a2  ol 3 + 7oi4 

= 

0 

0 

ol  1 — 2ai2  + 2a3  + 2a4 

3cei  — 602  + 4a3 

0 

0 

Using  our  definition  of  matrix  equality  (Definition  ME)  to  equate  entries  we  get 
the  homogeneous  system  of  six  equations  in  four  variables, 

2«i  — 4a2  + CX3  — 5oq  = 0 
ol 3 T 3a4  — 0 
a\  — 2a2  — 2a3  — lOaq  = 0 
— o.\  + 2a2  + a 3 + 7a4  = 0 
a 1 — 2a2  + 2a3  + 2ci4  = 0 
3aq  — 6a2  + 4c*3  = 0 

Form  the  coefficient  matrix  of  this  homogeneous  system  and  row-reduce  to  obtain 

0]  -2  0 -4' 

0 0 0 3 

0 0 0 0 

0 0 0 0 

0 0 0 0 

.0  0 0 0 . 

Analyzing  this  we  see  that  the  system  is  consistent  (we  expected  this  since  the 
system  is  homogeneous,  Theorem  HSC)  and  has  n — r = 4 — 2 = 2 free  variables, 
namely  a2  and  a4.  This  means  there  are  infinitely  many  solutions,  and  in  particular, 
we  can  find  a nontrivial  solution,  so  long  as  we  do  not  pick  all  of  our  free  variables 
to  be  zero.  The  mere  presence  of  a nontrivial  solution  for  these  scalars  is  enough  to 
conclude  that  S'  is  a linearly  dependent  set  (Definition  LI).  But  let  us  go  ahead  and 
explicitly  construct  a nontrivial  relation  of  linear  dependence. 

Choose  a2  = 1 and  a4  = —1.  There  is  nothing  special  about  this  choice,  there 
are  infinitely  many  possibilities,  some  “easier”  than  this  one,  just  avoid  picking  both 
variables  to  be  zero.  (Why  not?)  Then  we  find  the  dependent  variables  to  have  values 
a\  = — 2 and  0:3  = 3.  So  the  relation  of  linear  dependence, 


'2  0 ' 

'-4  0 ' 

r 1 11 

' -5  3- 

'0  O' 

(-2) 

1 -1 
1 3 

+ (1) 

-2  2 
—2  —6 

+ (3) 

+ (-l) 

-10  7 
2 0 

— 

0 0 
0 0 

is  an  iron-clad  demonstration  that  S is  linearly  dependent.  Can  you  construct  another 
such  demonstration?  A 

Example  LIC  Linearly  independent  set  in  the  crazy  vector  space 

Is  the  set  R = {(1,  0),  (6,  3)}  linearly  independent  in  the  crazy  vector  space  C 

(Example  CVS)? 
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We  begin  with  an  arbitrary  relation  of  linear  dependence  on  R 
0 = ai(l,  0)  + 02(6,  3)  Definition  RLD 


and  then  massage  it  to  a point  where  we  can  apply  the  definition  of  equality  in  C. 
Recall  the  definitions  of  vector  addition  and  scalar  multiplication  in  C are  not  what 
you  would  expect. 


(-1,-1) 


= 0 

= ai(l,  0)  + 012(6,  3) 

= (lai  + aq  — 1,  Oai  A ai  — 1) A (6a2  + a2  — 1,  3a2  + a2  — 1) 

= (2ai  — 1,  oq  — 1)  A (7a2  - 1,  4a2  - 1) 

= (2 cli  — 1 A 7a2  — 1 + 1,  oq  — 1 A 4 u2  — 1 A 1) 

= (2ai  A 7o2  — 1,  ci\  A 4a2  — 1) 

Equality  in  C (Example  CVS)  then  yields  the  two  equations, 


Example  CVS 
Definition  RLD 
Example  CVS 

Example  CVS 


2oq  A 7a2  — 1 — — 1 
ai  A 4a2  — 1 = — 1 


which  becomes  the  homogeneous  system 

2ui  A 7a2  = 0 
ai  A 4a2  = 0 

Since  the  coefficient  matrix  of  this  system  is  nonsingular  (check  this!)  the  system 
has  only  the  trivial  solution  a±  = a2  = 0.  By  Definition  LI  the  set  R is  linearly 
independent.  Notice  that  even  though  the  zero  vector  of  C is  not  what  we  might 
have  first  suspected,  a question  about  linear  independence  still  concludes  with  a 
question  about  a homogeneous  system  of  equations.  Hmmm.  A 


Subsection  SS 
Spanning  Sets 

In  a vector  space  V,  suppose  we  are  given  a set  of  vectors  S C V.  Then  we  can 
immediately  construct  a subspace,  (S),  using  Definition  SS  and  then  be  assured 
by  Theorem  SSS  that  the  construction  does  provide  a subspace.  We  now  turn  the 
situation  upside-down.  Suppose  we  are  first  given  a subspace  W C V.  Can  we  find  a 
set  S so  that  (S)  = W?  Typically  W is  infinite  and  we  are  searching  for  a finite  set 
of  vectors  S that  we  can  combine  in  linear  combinations  and  “build”  all  of  W. 

I like  to  think  of  S as  the  raw  materials  that  are  sufficient  for  the  construction  of 
W.  If  you  have  nails,  lumber,  wire,  copper  pipe,  drywall,  plywood,  carpet,  shingles, 
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paint  (and  a few  other  things),  then  you  can  combine  them  in  many  different  ways 
to  create  a house  (or  infinitely  many  different  houses  for  that  matter).  A fast-food 
restaurant  may  have  beef,  chicken,  beans,  cheese,  tortillas,  taco  shells  and  hot  sauce 
and  from  this  small  list  of  ingredients  build  a wide  variety  of  items  for  sale.  Or 
maybe  a better  analogy  comes  from  Ben  Cordes  — the  additive  primary  colors  (red, 
green  and  blue)  can  be  combined  to  create  many  different  colors  by  varying  the 
intensity  of  each.  The  intensity  is  like  a scalar  multiple,  and  the  combination  of  the 
three  intensities  is  like  vector  addition.  The  three  individual  colors,  red,  green  and 
blue,  are  the  elements  of  the  spanning  set. 

Because  we  will  use  terms  like  “spanned  by”  and  “spanning  set,”  there  is  the 
potential  for  confusion  with  “the  span.”  Come  back  and  reread  the  first  paragraph  of 
this  subsection  whenever  you  are  uncertain  about  the  difference.  Here  is  the  working 
definition. 

Definition  SSVS  Spanning  Set  of  a Vector  Space 

Suppose  V is  a vector  space.  A subset  S of  V is  a spanning  set  of  V if  (S)  = V. 
In  this  case,  we  also  frequently  say  S spans  V.  □ 

The  definition  of  a spanning  set  requires  that  two  sets  (subspaces  actually)  be 
equal.  If  S'  is  a subset  of  V,  then  (S)  C V,  always.  Thus  it  is  usually  only  necessary 
to  prove  that  V C (S).  Now  would  be  a good  time  to  review  Definition  SE. 

Example  SSP4  Spanning  set  in  P4 
In  Example  SP4  we  showed  that 

W = {p(x)\p  G P4,  p{ 2)  = 0} 

is  a subspace  of  P4,  the  vector  space  of  polynomials  with  degree  at  most  4 (Example 
VSP).  In  this  example,  we  will  show  that  the  set 

S = {x  - 2,  x2  - 4x  + 4,  x3  — 6x2  + 12x  - 8,  x4  - 8x3  + 24x2  - 32x  + 16} 

is  a spanning  set  for  W . To  do  this,  we  require  that  W = (S).  This  is  an  equality  of 
sets.  We  can  check  that  every  polynomial  in  S has  x = 2 as  a root  and  therefore 
S C W.  Since  W is  closed  under  addition  and  scalar  multiplication,  (S)  C W also. 

So  it  remains  to  show  that  W C (S)  (Definition  SE).  To  do  this,  begin  by 
choosing  an  arbitrary  polynomial  in  W,  say  r(x)  = ax 4 + bx3  + cx 2 + dx  + e £ W. 
This  polynomial  is  not  as  arbitrary  as  it  would  appear,  since  we  also  know  it  must 
have  x = 2 as  a root.  This  translates  to 

0 = a(2)4  + b{  2)3  + c(2)2  + d(2)  + e = 16a  + 8b  + 4c  + 2d  + e 

as  a condition  on  r. 

We  wish  to  show  that  r is  a polynomial  in  (S),  that  is,  we  want  to  show  that  r 
can  be  written  as  a linear  combination  of  the  vectors  (polynomials)  in  S.  So  let  us 
try. 

r(x)  = ax 4 + bx3  + cx2  + dx  + e 
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= a\  (x  — 2)  + a 2 (x2  — 4x  A 4)  A a3  (a;3  — Qx2  + 12a;  — 8) 

+ aq  ( x 4 — 8a;3  + 24a;2  — 32a;  A 16) 

= a 4X4  A («3  — 8014)  a;3  + («2  — 603  + 24«4)  x2 

A ( Ofi  — 4ci2  + 12a3  — 32aq)  x A (— 2ai  + 4a2  — 80:3  + 160:4) 

Equating  coefficients  (vector  equality  in  P4)  gives  the  system  of  five  equations  in 
four  variables, 

04  = a 
03  — 804  = 5 
02  — 603  + 2404  = c 
oi  — 402  + 1203  — 32o4  = d 
— 2oi  + 4o2  — 803  + 1604  = e 


Any  solution  to  this  system  of  equations  will  provide  the  linear  combination  we 
need  to  determine  if  r € (S),  but  we  need  to  be  convinced  there  is  a solution  for  any 
values  of  a,  5,  c,  d,  e that  qualify  r to  be  a member  of  W.  So  the  question  is:  is  this 
system  of  equations  consistent?  We  will  form  the  augmented  matrix,  and  row-reduce. 
(We  probably  need  to  do  this  by  hand,  since  the  matrix  is  symbolic  — reversing  the 
order  of  the  first  four  rows  is  the  best  way  to  start).  We  obtain  a matrix  in  reduced 
row-echelon  form 


0 0 0 0 

0 0 0 0 

0 0 0 0 

0 0 0 0 

. 0 0 0 0 


0 0 0 0 
0 0 0 0 
0 0 0 0 
0 0 0 0 
0 0 0 0 


32a  + 125  + 4 c + d 
24  a A 65  A c 
8a  + 5 
a 

16a  A 85  A 4c  A 2 d A e. 

32a  A 125  A 4c  A d 
24 a A 65  A c 
8a  A 5 
a 
0 


For  your  results  to  match  our  first  matrix,  you  may  find  it  necessary  to  multiply 
the  final  row  of  your  row-reduced  matrix  by  the  appropriate  scalar,  and/or  add 
multiples  of  this  row  to  some  of  the  other  rows.  To  obtain  the  second  version  of  the 
matrix,  the  last  entry  of  the  last  column  has  been  simplified  to  zero  according  to  the 
one  condition  we  were  able  to  impose  on  an  arbitrary  polynomial  from  W.  Since  the 
last  column  is  not  a pivot  column,  Theorem  RCLS  tells  us  this  system  is  consistent. 
Therefore,  any  polynomial  from  W can  be  written  as  a linear  combination  of  the 
polynomials  in  S,  so  W C (S').  Therefore,  W = (S)  and  S is  a spanning  set  for  W 
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by  Definition  SSVS. 

Notice  that  an  alternative  to  row-reducing  the  augmented  matrix  by  hand  would 
be  to  appeal  to  Theorem  FS  by  expressing  the  column  space  of  the  coefficient  matrix 
as  a null  space,  and  then  verifying  that  the  condition  on  r guarantees  that  r is  in 
the  column  space,  thus  implying  that  the  system  is  always  consistent.  Give  it  a try, 
we  will  wait.  This  has  been  a complicated  example,  but  worth  studying  carefully. A 

Given  a subspace  and  a set  of  vectors,  as  in  Example  SSP4  it  can  take  some 
work  to  determine  that  the  set  actually  is  a spanning  set.  An  even  harder  problem  is 
to  be  confronted  with  a subspace  and  required  to  construct  a spanning  set  with  no 
guidance.  We  will  now  work  an  example  of  this  flavor,  but  some  of  the  steps  will  be 
unmotivated.  Fortunately,  we  will  have  some  better  tools  for  this  type  of  problem 
later  on. 


Example  SSM22  Spanning  set  in  M2 2 

In  the  space  of  all  2 x 2 matrices,  M22  consider  the  subspace 


Z = 


a 5 
c d 


a + 36  — c — 5d  = 0,  —2  a — 6b  + 3c  + 14  d = 0 


and  find  a spanning  set  for  Z. 

We  need  to  construct  a limited  number  of  matrices  in  Z so  that  every  matrix 
in  Z can  be  expressed  as  a linear  combination  of  this  limited  number  of  matrices. 
a b 


Suppose  that  B = ^ 

the  entries  of  B and  write 


is  a matrix  in  Z.  Then  we  can  form  a column  vector  with 


1 3-1-5 

-2  -6  3 14 


Row-reducing  this  matrix  and  applying  Theorem  REMES  we  obtain  the  equivalent 
statement, 


~a 

b 

c 

A 

u 

"a" 

b 

gAt( 

c 

A 

u 

U]  3 0 -1 
VLo  0 QT]  4 _ 


We  can  then  express  the  subspace  Z in  the  following  equal  forms, 


Z = 


a b 
c d 

a b 
c d 

a b 
c d 


a + 35  — c — 5d  = 0,  —2  a — 6b  + 3c  + 14  d = 0 
a + 35  — d = 0,  c + 4d  = 0 
a = —35  + d,  c = —4 d 
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—36  + d b 
—4  d d 


-36  6 

0 0 


= 6 


-3  1 

0 0 

-3  i 

0 0 


6,  dec 

d o' 

—4  d d 

' 1 0 

-4  1 

1 0' 

-4  1 


6,  d€C 
6,  dec 


So  the  set 

Q = 

spans  Z by  Definition  SSVS. 


-3  1 

0 0 


1 0 

-4  1 


A 


Example  SSC  Spanning  set  in  the  crazy  vector  space 

In  Example  LIC  we  determined  that  the  set  R = {(1,  0),  (6,  3)}  is  linearly  indepen- 
dent in  the  crazy  vector  space  C (Example  CVS).  We  now  show  that  R is  a spanning 
set  for  C. 

Given  an  arbitrary  vector  (x,  y)  £ C we  desire  to  show  that  it  can  be  written  as 
a linear  combination  of  the  elements  of  R.  In  other  words,  are  there  scalars  cii  and 
a2  so  that 


(x,  y)  = ai(l,  0)  + a2(6,  3) 

We  will  act  as  if  this  equation  is  true  and  try  to  determine  just  what  a\  and 
o2  would  be  (as  functions  of  x and  y).  Recall  that  our  vector  space  operations  are 
unconventional  and  are  defined  in  Example  CVS. 

(x,  y)  = ai(l,  0)  + a2(6,  3) 

= (lai  + ai  — 1,  0ai  + ai  — 1)  + (6a2  + a2  — 1,  3a2  + a2  — 1) 

= (2ai  — 1,  Oi  — 1)  + (7o2  - 1,  4a2  - 1) 

= (2ai  — 1 T lai  — 1 1 - cq  — 1 -t-  4o2  — 1 — f—  1 ) 

= (2ai  + 7a2  — 1,  a±  + 4a2  — 1) 

Equality  in  C then  yields  the  two  equations, 


2ai  + 7 a2  — 1 = x 
ai  + 4a2  — l = y 

which  becomes  the  linear  system  with  a matrix  representation 


2 

7 

0-1 

x+\ 

1 

4 

a2 

y + 1 

The  coefficient  matrix  of  this  system  is  nonsingular,  hence  invertible  (Theorem 
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NI),  and  we  can  employ  its  inverse  to  find  a solution  (Theorem  TTMI,  Theorem 
SNCM), 


ai 

2 7 

-1 

x + 1 

4 -7 

x + 1 

1 

-a 

*5 

1 

CO 

<12 

1 4 

y + 1 

-1  2 

y + 1 

—x  + 2y  + 1 

We  could  chase  through  the  above  implications  backwards  and  take  the  existence 
of  these  solutions  as  sufficient  evidence  for  R being  a spanning  set  for  C.  Instead,  let 
us  view  the  above  as  simply  scratchwork  and  now  get  serious  with  a simple  direct 
proof  that  R is  a spanning  set.  Ready?  Suppose  (x,  y)  is  any  vector  from  C,  then 
compute  the  following  linear  combination  using  the  definitions  of  the  operations  in 

C, 

(4®  -7 y-  3) (1,  0)  + (-x  + 2 y + 1)(6,  3) 

= (l(4x  - 7y  - 3)  + (4x  - 7y  - 3)  - 1,  0(4x  - 7y  - 3)  + (4x  - 7y  - 3)  - 1)  + 
(6(— x + 2y  + 1)  + ( x + 2y  + 1)  - 1,  3(-x  + 2y  + 1)  + (-x  + 2y  + 1)  - 1) 

= (8x  — 14 y — 7,  4x  — 7y  — 4)  + (— 7x  + 14 y + 6,  — 4x  + 8y  + 3) 

= ((8x  — 14y  — 7)  + (— 7x  + 14 y + 6)  + 1,  (4x  — 7 y — 4)  + (— 4x  + 8y  + 3)  + 1) 

= (x,  y ) 

This  final  sequence  of  computations  in  C is  sufficient  to  demonstrate  that  any 
element  of  C can  be  written  (or  expressed)  as  a linear  combination  of  the  two  vectors 
in  R,  so  C C ( R ).  Since  the  reverse  inclusion  (R)  C C is  trivially  true,  C = (R)  and 
we  say  R spans  C (Definition  SSVS).  Notice  that  this  demonstration  is  no  more  or 
less  valid  if  we  hide  from  the  reader  our  scratchwork  that  suggested  aq  = 4x  — 7y  — 3 
and  02  = —x  + 2y  + 1.  A 


Subsection  VR 
Vector  Representation 


In  Chapter  R we  will  take  up  the  matter  of  representations  fully,  where  Theorem 
VRRB  will  be  critical  for  Definition  VR.  We  will  now  motivate  and  prove  a critical 
theorem  that  tells  us  how  to  “represent”  a vector.  This  theorem  could  wait,  but 
working  with  it  now  will  provide  some  extra  insight  into  the  nature  of  linearly 
independent  spanning  sets.  First  an  example,  then  the  theorem. 


Example  AVR  A vector  representation 
Consider  the  set 


r 

--T 

6" 

s=\ 

5 

5 

5 

l 

1 

0 

from  the  vector  space  C3.  Let  A be  the  matrix  whose  columns  are  the  set  S , and 
verify  that  A is  nonsingular.  By  Theorem  NMLIC  the  elements  of  S form  a linearly 
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independent  set.  Suppose  that  b £ C3.  Then  CS{A,  b)  has  a (unique)  solution 
(Theorem  NMUS)  and  hence  is  consistent.  By  Theorem  SLSLC,  b £ (S).  Since  b 
is  arbitrary,  this  is  enough  to  show  that  ( S ) = C3,  and  therefore  S'  is  a spanning 
set  for  C3  (Definition  SSVS).  (This  set  comes  from  the  columns  of  the  coefficient 
matrix  of  Archetype  B.) 


Now  examine  the  situation  for  a particular  choice  of  b,  say  b 


"-33" 

24 

5 


. Because 


S is  a spanning  set  for  C3,  we  know  we  can  write  b as  a linear  combination  of  the 
vectors  in  S, 


"-33" 

--T 

6' 

"—12" 

24 

5 

= (-3) 

5 

1 

+ (5) 

5 

0 

+ (2) 

7 

4 

The  nonsingularity  of  the  matrix  A tells  that  the  scalars  in  this  linear  combination 
are  unique.  More  precisely,  it  is  the  linear  independence  of  S that  provides  the 
uniqueness.  We  will  refer  to  the  scalars  a3  = —3,  02  = 5,  03  = 2 as  a “representation 
of  b relative  to  S .”  In  other  words,  once  we  settle  on  S'  as  a linearly  independent 
set  that  spans  C3,  the  vector  b is  recoverable  just  by  knowing  the  scalars  a\  = —3, 
02  = 5,  03  = 2 (use  these  scalars  in  a linear  combination  of  the  vectors  in  S).  This  is 
all  an  illustration  of  the  following  important  theorem,  which  we  prove  in  the  setting 
of  a general  vector  space.  A 


Theorem  VRRB  Vector  Representation  Relative  to  a Basis 
Suppose  that  V is  a vector  space  and  B = {vi,  V2,  V3,  . . . , vm}  is  a linearly  inde- 
pendent set  that  spans  V.  Let  w be  any  vector  in  V . Then  there  exist  unique  scalars 
Oi,  02;  03;  • • • ; om  such  that 


w = Oi v 1 + o2 V2  + a3v3  H 1-  amvm 


Proof.  That  w can  be  written  as  a linear  combination  of  the  vectors  in  B follows 
from  the  spanning  property  of  the  set  (Definition  SSVS).  This  is  good,  but  not  the 
meat  of  this  theorem.  We  now  know  that  for  any  choice  of  the  vector  w there  exist 
some  scalars  that  will  create  w as  a linear  combination  of  the  basis  vectors.  The 
real  question  is:  Is  there  more  than  one  way  to  write  w as  a linear  combination  of 
{vi,  V2,  v3,  . . . , vm}?  Are  the  scalars  a3,  a2,  a3,  . . . , am  unique?  (Proof  Technique 
U) 

Assume  there  are  two  different  linear  combinations  of  {vi,  V2,  v3,  ...,  vm} 
that  equal  the  vector  w.  In  other  words  there  exist  scalars  aq,  02,  a3,  . . . , am  and 
bi,  b2,  b3,  ...,bm  so  that 

w = cqvi  + a2v2  + a3v3  H b amvm 

w = b1v1  + &2v2  + b3v3  H b 6mvm. 
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Then  notice  that 


0 = w + (— w) 

= w + (— l)w 

= (ciiVi  + 02V2  + a3V3  + • • • + amvm)+ 

(-l)(6ivi  + b2v2  + b3v3  h bm^m) 

= (aivi  + a2y2  + a3v3  H b amvm)+ 

(-&1V1  - &2v2  - &3V3  - ...  - bmvm) 

— (&1  ~ &l)vl  + (a 2 ~ b2)\2  + (<23  ~ &3)v3  + 


Property  AI 
Theorem  AISM 


Property  DVA 


• • • + (am  — bm)wm  Property  C,  Property  DSA 

But  this  is  a relation  of  linear  dependence  on  a linearly  independent  set  of 
vectors  (Definition  RLD)!  Now  we  are  using  the  other  assumption  about  B , that 
{vi,  v2,  V3,  . . . , vm}  is  a linearly  independent  set.  So  by  Definition  LI  it  must 
happen  that  the  scalars  are  all  zero.  That  is, 


(ai-6i)  = 0 (a2  — 62)  = 0 (a3  — 63)  = 0 ...  (am  - bm)  = 0 

«i  = bi  a2  = b2  a3  = b3  ...  am  — bm. 

And  so  we  find  that  the  scalars  are  unique.  ■ 


The  converse  of  Theorem  VRRB  is  true  as  well,  but  is  not  important  enough  to 
rise  beyond  an  exercise  (see  Exercise  LISS.T51). 

This  is  a very  typical  use  of  the  hypothesis  that  a set  is  linearly  independent 
obtain  a relation  of  linear  dependence  and  then  conclude  that  the  scalars  must  all 
be  zero.  The  result  of  this  theorem  tells  us  that  we  can  write  any  vector  in  a vector 
space  as  a linear  combination  of  the  vectors  in  a linearly  independent  spanning  set, 
but  only  just.  There  is  only  enough  raw  material  in  the  spanning  set  to  write  each 
vector  one  way  as  a linear  combination.  So  in  this  sense,  we  could  call  a linearly 
independent  spanning  set  a “minimal  spanning  set.”  These  sets  are  so  important 
that  we  will  give  them  a simpler  name  ( “basis” ) and  explore  their  properties  further 
in  the  next  section. 


Reading  Questions 


1.  Is  the  set  of  matrices  below  linearly  independent  or  linearly  dependent  in  the  vector 
space  M22?  Why  or  why  not? 


1 3 
-2  4 


-2  3 

3 -5 


0 9 
-1  3 


2.  Explain  the  difference  between  the  following  two  uses  of  the  term  “span”: 


1.  S'  is  a subset  of  the  vector  space  V and  the  span  of  S is  a subspace  of  V. 
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2.  IF  is  a subspace  of  the  vector  space  Y and  T spans  W. 
3.  The  set 


r 

6' 

' 4 ' 

'5' 

1 

s=\ 

2 

-3 

8 

1 

l 

1 

1 

2 

1 

is  linearly  independent  and  spans  C3.  Write  the  vector  x = 


-6 

2 

2 


as  a linear  combination 


of  the  elements  of  S.  How  many  ways  are  there  to  answer  this  question,  and  which 
theorem  allows  you  to  say  so? 


Exercises 

C20'  In  the  vector  space  of  2 x 2 matrices,  M22,  determine  if  the  set  S below  is  linearly 
independent. 


S = 


-1 

3 


0 4 

-1  2 


4 2 
1 3 


C21'  In  the  crazy  vector  space  C (Example  CVS),  is  the  set  S = {(0,  2),  (2,  8)}  linearly 
independent? 

C22'  In  the  vector  space  of  polynomials  P3,  determine  if  the  set  S is  linearly  independent 
or  linearly  dependent. 

S = {2  + x — 3x1 2  — 8x3,  1 + x + x2  + 5x3,  3 — 4x2  — 7ie3} 


C23^  Determine  if  the  set  S = {(3,  1),  (7,  3)}  is  linearly  independent  in  the  crazy  vector 
space  C (Example  CVS). 

C24?  In  the  vector  space  of  real-valued  functions  F = { f\  f : R — > R},  determine  if  the 
following  set  S is  linearly  independent. 

S = {sin2  x,  cos2  x,  2} 


C25f  Let 


f 

'l 

2 

' 2 l' 

\ 

2 

1 

? 

-1  2 

5 

1 

2 


1.  Determine  if  S spans  M22. 

2.  Determine  if  S is  linearly  independent. 


C26f  Let 


S = 


2 1 

0 1 

5 

-1  2 

5 

1 2 

1 0 

1 1 

J 

1.  Determine  if  S spans  M22. 
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2.  Determine  if  S is  linearly  independent. 


C30  In  Example  LIM32,  find  another  nontrivial  relation  of  linear  dependence  on  the 
linearly  dependent  set  of  3 x 2 matrices,  S'. 

C4CL  Determine  if  the  set  T = { x 2 — x + 5,  4x3  — x2  + 5x,  3x  + 2}  spans  the  vector 
space  of  polynomials  with  degree  4 or  less,  Pa. 

C41'  The  set  W is  a subspace  of  M22,  the  vector  space  of  all  2 x 2 matrices.  Prove  that 
S is  a spanning  set  for  W. 


f 

a b 

'I  r 

1 o' 

0 1 ' 

o o' 

{ 

c d 

2a  — 3b  + 4c  — d = 0 > 5 = < 

0 2 

0 -3 

1 4 

C42'  Determine  if  the  set  S = {(3,  1),  (7,  3)}  spans  the  crazy  vector  space  C (Example 
CVS). 

MIO1"  Halfway  through  Example  SSP4,  we  need  to  show  that  the  system  of  equations 


CS 


' 0 0 0 1 ' 

"a" 

\ 

0 0 1-8 

b 

0 1 -6  24 

5 

c 

1 -4  12  -32 

d 

-2  4 -8  16 

_e_ 

/ 

is  consistent  for  every  choice  of  the  vector  of  constants  satisfying  16a  + 86  + 4c  + 2d  + e = 0. 


Express  the  column  space  of  the  coefficient  matrix  of  this  system  as  a null  space,  us- 
ing Theorem  FS.  From  this  use  Theorem  CSCS  to  establish  that  the  system  is  always 
consistent.  Notice  that  this  approach  removes  from  Example  SSP4  the  need  to  row-reduce 
a symbolic  matrix. 

T2(F  Suppose  that  S’  is  a finite  linearly  independent  set  of  vectors  from  the  vector  space 
V . Let  T be  any  subset  of  S.  Prove  that  T is  linearly  independent. 

T40  Prove  the  following  variant  of  Theorem  EMMVP  that  has  a weaker  hypothesis:  Sup- 
pose that  C = {ui,  U2,  U3,  . . . , Up}  is  a linearly  independent  spanning  set  for  Cn.  Suppose 
also  that  A and  B are  m x n matrices  such  that  Au;  = B u;  for  every  1 < * < n.  Then  A = B. 


Can  you  weaken  the  hypothesis  even  further  while  still  preserving  the  conclusion? 

T5(F  Suppose  that  V is  a vector  space  and  u,  v € V are  two  vectors  in  V.  Use  the 
definition  of  linear  independence  to  prove  that  S = {u,  v}  is  a linearly  dependent  set  if 
and  only  if  one  of  the  two  vectors  is  a scalar  multiple  of  the  other.  Prove  this  directly  in 
the  context  of  an  abstract  vector  space  ( V ),  without  simply  giving  an  upgraded  version  of 
Theorem  DLDS  for  the  special  case  of  just  two  vectors. 

T511  Carefully  formulate  the  converse  of  Theorem  VRRB  and  provide  a proof. 


Section  B 
Bases 


A basis  of  a vector  space  is  one  of  the  most  useful  concepts  in  linear  algebra.  It  often 
provides  a concise,  finite  description  of  an  infinite  vector  space. 

Subsection  B 
Bases 

We  now  have  all  the  tools  in  place  to  define  a basis  of  a vector  space. 

Definition  B Basis 

Suppose  V is  a vector  space.  Then  a subset  S C V is  a basis  of  V if  it  is  linearly 
independent  and  spans  V . □ 

So,  a basis  is  a linearly  independent  spanning  set  for  a vector  space.  The  re- 
quirement that  the  set  spans  V insures  that  S has  enough  raw  material  to  build  V, 
while  the  linear  independence  requirement  insures  that  we  do  not  have  any  more 
raw  material  than  we  need.  As  we  shall  see  soon  in  Section  D,  a basis  is  a minimal 
spanning  set. 

You  may  have  noticed  that  we  used  the  term  basis  for  some  of  the  titles  of  previous 
theorems  (e.g.  Theorem  BNS,  Theorem  BCS,  Theorem  BRS)  and  if  you  review  each 
of  these  theorems  you  will  see  that  their  conclusions  provide  linearly  independent 
spanning  sets  for  sets  that  we  now  recognize  as  subspaces  of  Cm.  Examples  associated 
with  these  theorems  include  Example  NSLIL,  Example  CSOCD  and  Example  IAS. 
As  we  will  see,  these  three  theorems  will  continue  to  be  powerful  tools,  even  in  the 
setting  of  more  general  vector  spaces. 

Furthermore,  the  archetypes  contain  an  abundance  of  bases.  For  each  coefficient 
matrix  of  a system  of  equations,  and  for  each  archetype  defined  simply  as  a matrix, 
there  is  a basis  for  the  null  space,  three  bases  for  the  column  space,  and  a basis  for 
the  row  space.  For  this  reason,  our  subsequent  examples  will  concentrate  on  bases 
for  vector  spaces  other  than  Cm. 

Notice  that  Definition  B does  not  preclude  a vector  space  from  having  many 
bases,  and  this  is  the  case,  as  hinted  above  by  the  statement  that  the  archetypes 
contain  three  bases  for  the  column  space  of  a matrix.  More  generally,  we  can  grab 
any  basis  for  a vector  space,  multiply  any  one  basis  vector  by  a nonzero  scalar  and 
create  a slightly  different  set  that  is  still  a basis.  For  “important”  vector  spaces,  it 
will  be  convenient  to  have  a collection  of  “nice”  bases.  When  a vector  space  has 
a single  particularly  nice  basis,  it  is  sometimes  called  the  standard  basis  though 
there  is  nothing  precise  enough  about  this  term  to  allow  us  to  define  it  formally  — 
it  is  a question  of  style.  Here  are  some  nice  bases  for  important  vector  spaces. 

Theorem  SUVB  Standard  Unit  Vectors  are  a Basis 
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The  set  of  standard  unit  vectors  for  Cm  (Definition  SUV),  B = { e2;  | 1 < * < to}  is  a 
basis  for  the  vector  space  Cm . 


Proof  We  must  show  that  the  set  B is  both  linearly  independent  and  a spanning 
set  for  Cm.  First,  the  vectors  in  B are,  by  Definition  SUV,  the  columns  of  the 
identity  matrix,  which  we  know  is  nonsingular  (since  it  row-reduces  to  the  identity 
matrix,  Theorem  NMRRI).  And  the  columns  of  a nonsingular  matrix  are  linearly 
independent  by  Theorem  NMLIC. 

Suppose  we  grab  an  arbitrary  vector  from  Cm,  say 


Vi 

V2 


v = 


Us 


Can  we  write  v as  a linear  combination  of  the  vectors  in  B1  Yes,  and  quite 
simply. 


’«r 

pLl 

roi 

r°i 

roi 

u 

0 

i 

0 

0 

V3 

= Vi 

0 

+ V2 

0 

+ V3 

i 

+ • 

‘ + Vm 

0 

ym. 

0 

0 

0 

_i. 

v = iqei  + v2e2  + v3e3  H b vmem 


This  shows  that  Cm  C ( B ),  which  is  sufficient  to  show  that  B is  a spanning  set 
for  Cm.  ■ 


Example  BP  Bases  for  Pn 

The  vector  space  of  polynomials  with  degree  at  most  n,  Pn , has  the  basis 

B = {l,  x,  x 2,  x3,  . . . , xn } . 

Another  nice  basis  for  Pn  is 

C = { 1,  1 + X,  1 + x + x2,  1 + X + x2  + x3,  ...,  l+x  + x2+x3  + ---+  X71}  . 

Checking  that  each  of  B and  C is  a linearly  independent  spanning  set  are  good 
exercises.  A 

Example  BM  A basis  for  the  vector  space  of  matrices 

In  the  vector  space  Mmn  of  matrices  (Example  VSM)  define  the  matrices  Bm, 
1 < k < to,  1 < £ < n by 

[Bu\ij  = 


1 if  k = i,  l = j 
0 otherwise 
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So  these  matrices  have  entries  that  are  all  zeros,  with  the  exception  of  a lone 
entry  that  is  one.  The  set  of  all  inn  of  them, 

B = { Bkt  1 1 < k < m,  1 < l < rz} 

forms  a basis  for  Mmn.  See  Exercise  B.M20.  A 

The  bases  described  above  will  often  be  convenient  ones  to  work  with.  However 
a basis  does  not  have  to  obviously  look  like  a basis. 

Example  BSP4  A basis  for  a subspace  of  P4 
In  Example  SSP4  we  showed  that 

S = {x  — 2,  x2  — 4x  + 4,  x3  — 6x2  + 12x  — 8,  x4  — 8a;3  + 24x2  — 32a;  + 16} 

is  a spanning  set  for  W = {p{x)\p  € P4,  p( 2)  = 0}.  We  will  now  show  that  S is  also 
linearly  independent  in  W.  Begin  with  a relation  of  linear  dependence, 

0 + Ox  + Ox2  + Ox3  + Ox4 

= Qi  (x  — 2)  + 02  (x2  — 4x  + 4)  + «3  (x3  — 6x2  + 12x  — 8) 

+ «4  (x4  — 8x3  + 24x2  — 32x  + 16) 

= CI4X4  + (03  — 804)  x3  + (a2  — 603  + 24014)  x2 

+ (oi  — 4o2  + 1203  — 3204)  x + (— 2oi  + 4o2  — 803  + I604) 

Equating  coefficients  (vector  equality  in  P4)  gives  the  homogeneous  system  of 
five  equations  in  four  variables, 

04  = 0 
03  — 804  = 0 
02  — 603  + 2404  = 0 
oi  — 4o2  + 12o3  — 32o4  = 0 
— 2oi  + 4o2  — 803  + I604  = 0 


We  form  the  coefficient  matrix,  and  row-reduce  to  obtain  a matrix  in  reduced 
row-echelon  form 


-0  0 0 0- 

0 0 0 0 

0 0 0 0 

0 0 0 0 

.0000. 


With  only  the  trivial  solution  to  this  homogeneous  system,  we  conclude  that 
only  scalars  that  will  form  a relation  of  linear  dependence  are  the  trivial  ones,  and 
therefore  the  set  S is  linearly  independent  (Definition  LI).  Finally,  S has  earned  the 
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right  to  be  called  a basis  for  W (Definition  B). 


A 


Example  BSM22  A basis  for  a subspace  of  M22 
In  Example  SSM22  we  discovered  that 


Q 


-3  1 
0 0 


1 

-4 


0 

1 


is  a spanning  set  for  the  subspace 


Z = 


a b 
c d 


a + 36  — c — 5d  = 0,  —2  a — 6b  + 3c  + 14  d = 0 


of  the  vector  space  of  all  2x2  matrices,  M22.  If  we  can  also  determine  that  Q is 
linearly  independent  in  Z (or  in  M22),  then  it  will  qualify  as  a basis  for  Z. 

Let  us  begin  with  a relation  of  linear  dependence. 


0 

o' 

-3 

1 

' 1 

o' 

0 

0 

= Oil 

0 

0 

+ «2 

—4 

1 

— 3oq  + q.2  cn\ 
— 4«2  Ot2 


Using  our  definition  of  matrix  equality  (Definition  ME)  we  equate  entries  and 
get  a homogeneous  system  of  four  equations  in  two  variables, 


— 3«i  +a2=0 
= 0 
— 4tt2  = 0 
0-2  = 0 


We  could  row-reduce  the  coefficient  matrix  of  this  homogeneous  system,  but  it  is 
not  necessary.  The  second  and  fourth  equations  tell  us  that  aq  = 0,  a2  = 0 is  the 
only  solution  to  this  homogeneous  system.  This  qualifies  the  set  Q as  being  linearly 
independent,  since  the  only  relation  of  linear  dependence  is  trivial  (Definition  LI). 
Therefore  Q is  a basis  for  Z (Definition  B).  A 

Example  BC  Basis  for  the  crazy  vector  space 

In  Example  LIC  and  Example  SSC  we  determined  that  the  set  R = {(1,  0),  (6,  3)} 
from  the  crazy  vector  space,  C (Example  CVS),  is  linearly  independent  and  is  a 
spanning  set  for  C.  By  Definition  B we  see  that  R is  a basis  for  C.  A 

We  have  seen  that  several  of  the  sets  associated  with  a matrix  are  subspaces 
of  vector  spaces  of  column  vectors.  Specifically  these  are  the  null  space  (Theorem 
NSMS),  column  space  (Theorem  CSMS),  row  space  (Theorem  RSMS)  and  left  null 
space  (Theorem  LNSMS).  As  subspaces  they  are  vector  spaces  (Definition  S)  and  it 
is  natural  to  ask  about  bases  for  these  vector  spaces.  Theorem  BNS,  Theorem  BCS, 
Theorem  BRS  each  have  conclusions  that  provide  linearly  independent  spanning  sets 
for  (respectively)  the  null  space,  column  space,  and  row  space.  Notice  that  each  of 
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these  theorems  contains  the  word  “basis”  in  its  title,  even  though  we  did  not  know 
the  precise  meaning  of  the  word  at  the  time.  To  find  a basis  for  a left  null  space  we 
can  use  the  definition  of  this  subspace  as  a null  space  (Definition  LNS)  and  apply 
Theorem  BNS.  Or  Theorem  FS  tells  us  that  the  left  null  space  can  be  expressed  as 
a row  space  and  we  can  then  use  Theorem  BRS. 

Theorem  BS  is  another  early  result  that  provides  a linearly  independent  spanning 
set  (i.e.  a basis)  as  its  conclusion.  If  a vector  space  of  column  vectors  can  be 
expressed  as  a span  of  a set  of  column  vectors,  then  Theorem  BS  can  be  employed 
in  a straightforward  manner  to  quickly  yield  a basis. 

Subsection  BSCV 

Bases  for  Spans  of  Column  Vectors 

We  have  seen  several  examples  of  bases  in  different  vector  spaces.  In  this  subsection, 
and  the  next  (Subsection  B.BNM),  we  will  consider  building  bases  for  Cm  and  its 
subspaces. 

Suppose  we  have  a subspace  of  Cm  that  is  expressed  as  the  span  of  a set  of  vectors, 
S,  and  S is  not  necessarily  linearly  independent,  or  perhaps  not  very  attractive. 
Theorem  REMRS  says  that  row-equivalent  matrices  have  identical  row  spaces,  while 
Theorem  BRS  says  the  nonzero  rows  of  a matrix  in  reduced  row-echelon  form  are  a 
basis  for  the  row  space.  These  theorems  together  give  us  a great  computational  tool 
for  quickly  finding  a basis  for  a subspace  that  is  expressed  originally  as  a span. 

Example  RSB  Row  space  basis 

When  we  first  defined  the  span  of  a set  of  column  vectors,  in  Example  SCAD  we 
looked  at  the  set 


with  an  eye  towards  realizing  W as  the  span  of  a smaller  set.  By  building  relations 
of  linear  dependence  (though  we  did  not  know  them  by  that  name  then)  we  were 
able  to  remove  two  vectors  and  write  W as  the  span  of  the  other  two  vectors.  These 
two  remaining  vectors  formed  a linearly  independent  set,  even  though  we  did  not 
know  that  at  the  time. 

Now  we  know  that  IF  is  a subspace  and  must  have  a basis.  Consider  the  matrix, 
C,  whose  rows  are  the  vectors  in  the  spanning  set  for  W, 

f 2 -3  11 


-7  -6  -5. 

Then,  by  Definition  RSM,  the  row  space  of  C will  be  IF,  17.(0)  = IF.  Theorem 
BRS  tells  us  that  if  we  row-reduce  C,  the  nonzero  rows  of  the  row-equivalent  matrix 


§B 


Beezer:  A First  Course  in  Linear  Algebra 


308 


in  reduced  row-echelon  form  will  be  a basis  for  TZ(C),  and  hence  a basis  for  W.  Let 
us  do  it  — C row-reduces  to 


'0  o 

0 E 
0 0 
0 0 


7_ 

11 

J_ 

11 


0 

0 


If  we  convert  the  two  nonzero  rows  to  column  vectors  then  we  have  a basis, 


and 


For  aesthetic  reasons,  we  might  wish  to  multiply  each  vector  in  B by  11,  which 
will  not  change  the  spanning  or  linear  independence  properties  of  B as  a basis.  Then 
we  can  also  write 


W = 


'O' 

11 

1 


A 


Example  IAS  provides  another  example  of  this  flavor,  though  now  we  can  notice 
that  X is  a subspace,  and  that  the  resulting  set  of  three  vectors  is  a basis.  This  is 
such  a powerful  technique  that  we  should  do  one  more  example. 


Example  RS  Reducing  a span 

In  Example  RSC5  we  began  with  a set  of  n = 4 vectors  from  C5, 


R = {vi,  V2,  v3,  v4} 


r 1 1 

[-21 

r o i 

1-41 

2 

1 

-7 

1 

-1 

3 

6 

2 

3 

1 

-11 

1 

2 

2 

-2 

6 

and  defined  V = ( R ).  Our  goal  in  that  problem  was  to  find  a relation  of  linear 
dependence  on  the  vectors  in  A,  solve  the  resulting  equation  for  one  of  the  vectors, 
and  re-express  V as  the  span  of  a set  of  three  vectors. 

Here  is  another  way  to  accomplish  something  similar.  The  row  space  of  the  matrix 


A = 


1 

2 

0 

4 


2 -1 
1 3 

-7  6 

1 2 


3 

1 

-11 

1 


2 ' 
2 

-2 

6 
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is  equal  to  (R).  By  Theorem  BRS  we  can  row- reduce  this  matrix,  ignore  any  zero 
rows,  and  use  the  nonzero  rows  as  column  vectors  that  are  a basis  for  the  row  space 
of  A.  Row-reducing  A creates  the  matrix 


So 


1 0 0 
0 1 0 
0 0 1 
0 0 0 


d7 

17 


0 


30 

172 


I 

17 


r 

■ i ■ 

■ 0 ■ 

■ 0 ' 

0 

1 

0 

0 

0 

1 

1 

1 

25 

2 

3i7 

172 

¥ 

L 17  J 

L 17J 

L 17J 

> 

is  a basis  for  V.  Our  theorem  tells  us  this  is  a basis,  there  is  no  need  to  verify  that 
the  subspace  spanned  by  three  vectors  (rather  than  four)  is  the  identical  subspace, 
and  there  is  no  need  to  verify  that  we  have  reached  the  limit  in  reducing  the  set, 
since  the  set  of  three  vectors  is  guaranteed  to  be  linearly  independent.  A 


Subsection  BNM 

Bases  and  Nonsingular  Matrices 

A quick  source  of  diverse  bases  for  Cm  is  the  set  of  columns  of  a nonsingular  matrix. 

Theorem  CNMB  Columns  of  Nonsingular  Matrix  are  a Basis 

Suppose  that  A is  a square  matrix  of  size  m.  Then  the  columns  of  A are  a basis  of 

Cm  if  and  only  if  A is  nonsingular. 

Proof.  (=>)  Suppose  that  the  columns  of  A are  a basis  for  Cm.  Then  Definition  B 
says  the  set  of  columns  is  linearly  independent.  Theorem  NMLIC  then  says  that  A 
is  nonsingular. 

(<=)  Suppose  that  A is  nonsingular.  Then  by  Theorem  NMLIC  this  set  of 
columns  is  linearly  independent.  Theorem  CSNM  says  that  for  a nonsingular  matrix, 
C(A)  = Cm.  This  is  equivalent  to  saying  that  the  columns  of  A are  a spanning  set 
for  the  vector  space  Cm.  As  a linearly  independent  spanning  set,  the  columns  of  A 
qualify  as  a basis  for  Cm  (Definition  B).  ■ 

Example  CABAK  Columns  as  Basis,  Archetype  K 
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Archetype  K is  the  5x5  matrix 


' 10 

18 

24 

24 

-12' 

12 

-2 

-6 

0 

-18 

K = 

-30 

-21 

-23 

-30 

39 

27 

30 

36 

37 

-30 

18 

24 

30 

30 

-20 

which  is  row-equivalent  to  the  5x5  identity  matrix  1 5.  So  by  Theorem  NMRRI,  K 
is  nonsingular.  Then  Theorem  CNMB  says  the  set 


' 10  ' 

' 18  ' 

' 24  ' 

' 24  ' 

'-12' 

12 

-2 

-6 

0 

-18 

-30 

5 

-21 

5 

-23 

1 

-30 

5 

39 

27 

30 

36 

37 

-30 

18 

24 

30 

30 

-20 

is  a (novel)  basis  of  C5.  A 

Perhaps  we  should  view  the  fact  that  the  standard  unit  vectors  are  a basis 
(Theorem  SUVB)  as  just  a simple  corollary  of  Theorem  CNMB?  (See  Proof  Technique 
LC.) 

With  a new  equivalence  for  a nonsingular  matrix,  we  can  update  our  list  of 
equivalences. 

Theorem  NME5  Nonsingular  Matrix  Equivalences,  Round  5 
Suppose  that  A is  a square  matrix  of  size  n.  The  following  are  equivalent. 

1.  A is  nonsingular. 

2.  A row-reduces  to  the  identity  matrix. 

3.  The  null  space  of  A contains  only  the  zero  vector,  AT  (A)  = {0}. 

4.  The  linear  system  CS(A , b)  has  a unique  solution  for  every  possible  choice  of 

b. 

5.  The  columns  of  A are  a linearly  independent  set. 

6.  A is  invertible. 

1.  The  column  space  of  A is  C",  C(A)  = Cn. 

8.  The  columns  of  A are  a basis  for  Cn . 


Proof.  With  a new  equivalence  for  a nonsingular  matrix  in  Theorem  CNMB  we  can 
expand  Theorem  NME4.  ■ 
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Subsection  OBC 

Orthonormal  Bases  and  Coordinates 

We  learned  about  orthogonal  sets  of  vectors  in  Cm  back  in  Section  O,  and  we  also 
learned  that  orthogonal  sets  are  automatically  linearly  independent  (Theorem  OSLI). 
When  an  orthogonal  set  also  spans  a subspace  of  Cm,  then  the  set  is  a basis.  And 
when  the  set  is  orthonormal,  then  the  set  is  an  incredibly  nice  basis.  We  will  back  up 
this  claim  with  a theorem,  but  first  consider  how  you  might  manufacture  such  a set. 

Suppose  that  W is  a subspace  of  Cm  with  basis  B.  Then  B spans  W and  is 
a linearly  independent  set  of  nonzero  vectors.  We  can  apply  the  Gram-Schmidt 
Procedure  (Theorem  GSP)  and  obtain  a linearly  independent  set  T such  that 
(T)  = (B)  = W and  T is  orthogonal.  In  other  words,  T is  a basis  for  W,  and  is  an 
orthogonal  set.  By  scaling  each  vector  of  T to  norm  1,  we  can  convert  T into  an 
orthonormal  set,  without  destroying  the  properties  that  make  it  a basis  of  W.  In 
short,  we  can  convert  any  basis  into  an  orthonormal  basis.  Example  GSTV,  followed 
by  Example  ONTV,  illustrates  this  process. 

Unitary  matrices  (Definition  UM)  are  another  good  source  of  orthonormal  bases 
(and  vice  versa).  Suppose  that  Q is  a unitary  matrix  of  size  n.  Then  the  n columns  of 
Q form  an  orthonormal  set  (Theorem  CUMOS)  that  is  therefore  linearly  independent 
(Theorem  OSLI).  Since  Q is  invertible  (Theorem  UMI),  we  know  Q is  nonsingular 
(Theorem  NI),  and  then  the  columns  of  Q span  Cn  (Theorem  CSNM).  So  the  columns 
of  a unitary  matrix  of  size  n are  an  orthonormal  basis  for  C". 

Why  all  the  fuss  about  orthonormal  bases?  Theorem  VRRB  told  us  that  any 
vector  in  a vector  space  could  be  written,  uniquely,  as  a linear  combination  of  basis 
vectors.  For  an  orthonormal  basis,  finding  the  scalars  for  this  linear  combination 
is  extremely  easy,  and  this  is  the  content  of  the  next  theorem.  Furthermore,  with 
vectors  written  this  way  (as  linear  combinations  of  the  elements  of  an  orthonormal 
set)  certain  computations  and  analysis  become  much  easier.  Here  is  the  promised 
theorem. 

Theorem  COB  Coordinates  and  Orthonormal  Bases 

Suppose  that  B = {vj,  v2,  v3,  . . . , vp}  is  an  orthonormal  basis  of  the  subspace  W 
of  Cm . For  any  w £ W, 

w = (vi,  w)  v i + (v2 , w)  v 2 + (v3,  w)  v3  H 1-  (vp,  w)  vp 

Proof.  Because  B is  a basis  of  W,  Theorem  VRRB  tells  us  that  we  can  write  w 
uniquely  as  a linear  combination  of  the  vectors  in  B.  So  it  is  not  this  aspect  of 
the  conclusion  that  makes  this  theorem  interesting.  What  is  interesting  is  that  the 
particular  scalars  are  so  easy  to  compute.  No  need  to  solve  big  systems  of  equations 
just  do  an  inner  product  of  w with  v,:  to  arrive  at  the  coefficient  of  v,:  in  the  linear 
combination. 

So  begin  the  proof  by  writing  w as  a linear  combination  of  the  vectors  in  B , 
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using  unknown  scalars, 


w = aiVi  + a2v2  + a3v3  + • • • + a„v 


p v p 


and  compute, 


( vi;  y2akvk  \ 

Theorem  VRRB 

\ k= i / 

p 

^2  (Vi,  OfcVfe) 

Theorem  IPVA 

k= 1 

p 

(vi?  vfc) 

Theorem  IPSM 

k= 1 

P 

CLi  (V<,  Vi)  +^2ak  (Vi,  Vfe) 

Property  C 

k= 1 
k^i 

P 

ai{  1)  + y^Qfe(0) 

Definition  ONS 

fe=l 

k^i 

= <+ 

So  the  (unique)  scalars  for  the  linear  combination  are  indeed  the  inner  products 
advertised  in  the  conclusion  of  the  theorem’s  statement.  ■ 

Example  CROB4  Coordinatization  relative  to  an  orthonormal  basis,  C4 
The  set 


{xi,  x2 , x3 , x4}  = 


I + * 

1 

1 — * 
i 


1 + 5 1 
6 + 5* 
-7-i 
1-6* 


-7  + 34  * ' 
-8  - 23* 
— 10  + 22* 
30  + 13* 


-2  - 4 i 

6 + * 

4 + 3* 

6 — * 


was  proposed,  and  partially  verified,  as  an  orthogonal  set  in  Example  AOS.  Let 
us  scale  each  vector  to  norm  1,  so  as  to  form  an  orthonormal  set  in  C4.  Then  by 
Theorem  OSLI  the  set  will  be  linearly  independent,  and  by  Theorem  NME5  the 
set  will  be  a basis  for  C4.  So,  once  scaled  to  norm  1,  the  adjusted  set  will  be  an 
orthonormal  basis  of  C4.  The  norms  are, 

||xi||  = x/6  ||x2 1|  = \/l74  11x311=73451  ||x4||  = 71+9 

So  an  orthonormal  basis  is 


B = {vi,  v2,  v3 , v4} 


§B 


Beezer:  A First  Course  in  Linear  Algebra 


313 


7 

T + i 
1 

i 

'1  + 5 r 
6 + 5* 

1 

' -7  + 34* ' 
-8  - 23* 

1 

-2  - 4 i 
6 + * 

\ 

1 76 

1 — i 
i 

’ 7174 

-7-* 
.1  - 6*. 

’ V3451 

-10  + 22*: 
30  + 13* 

’ 7119 

4 + 3 i 
6 — i 

j 

Now,  to  illustrate  Theorem  COB,  choose  any  vector  from  C4,  say  w 
and  compute 


' 2 ' 
-3 
1 
4 


<v"w>  = 7 


(v3,  w)  = 


120-211* 


v/345l 

Then  Theorem  COB  guarantees  that 


(v2,  w)  = 
(v4,  w)  = 


-19  + 30* 

VT74 
6 + 12  i 

VU9 


' 2 ' 

-3 

—5  i 

( 

1 

T + i 
1 

\ -19  + 30* 

1 

H-* 

1 

76 

1 — i 
i 

\+  7174 

( 

"i  + 5 r 

\ 

1 

6 + 5* 

1 7174 

-7  -* 
1 - 6* 

J 

+ 


120  - 211* 

( * 

' -7  + 34* ' 
-8  - 23* 

) 

6 + 12  * 

( 

1 

'-2  - 4*' 
6 + * 

) 

V3451 

x/3451 

-10  + 22* 
30  + 13* 

J 

+ 7119 

7119 

4 + 3i 
6 — i 

> 

as  you  might  want  to  check  (if  you  have  unlimited  patience). 


A 


A slightly  less  intimidating  example  follows,  in  three  dimensions  and  with  just 
real  numbers. 


Example  CROB3  Coordinatization  relative  to  an  orthonormal  basis,  C3 
The  set 


{xi,  x2,  x3} 


is  a linearly  independent  set,  which  the  Gram-Schmidt  Process  (Theorem  GSP) 
converts  to  an  orthogonal  set,  and  which  can  then  be  converted  to  the  orthonormal 
set, 


B = {vi,  v2 , v3} 


1 

71 


-r 

0 

1 


i 

71 


■ i ■ 
-i 
i 


which  is  therefore  an  orthonormal  basis  of  C3.  With  three  vectors  in  C3,  all  with 
real  number  entries,  the  inner  product  (Definition  IP)  reduces  to  the  usual  “dot 
product”  (or  scalar  product)  and  the  orthogonal  pairs  of  vectors  can  be  interpreted 
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as  perpendicular  pairs  of  directions.  So  the  vectors  in  B serve  as  replacements  for  our 
usual  3-D  axes,  or  the  usual  3-D  unit  vectors  i,j  and  k.  We  would  like  to  decompose 
arbitrary  vectors  into  “components”  in  the  directions  of  each  of  these  basis  vectors. 

It  is  Theorem  COB  that  tells  us  how  to  do  this. 

2 

. Compute 


Suppose  that  we  choose  w = 


<v"w>  = y 5 


-1 

5 


(v2,  W)  = ^ 


then  Theorem  COB  guarantees  that 
2 

-1 
5 


5 

( 1 

T 

o 

V 3 ( 1 

-r 

0 

i 

V6 

A 

.1 

) V2\V2 

(v3,  w)  = — 


8 

V! 


+ ya  [y/3 


1 ■ 
-1 
1 


which  you  should  be  able  to  check  easily,  even  if  you  do  not  have  much  patience.  A 


Not  only  do  the  columns  of  a unitary  matrix  form  an  orthonormal  basis,  but  there 
is  a deeper  connection  between  orthonormal  bases  and  unitary  matrices.  Informally, 
the  next  theorem  says  that  if  we  transform  each  vector  of  an  orthonormal  basis  by 
multiplying  it  by  a unitary  matrix,  then  the  resulting  set  will  be  another  orthonormal 
basis.  And  more  remarkably,  any  matrix  with  this  property  must  be  unitary!  As  an 
equivalence  (Proof  Technique  E)  we  could  take  this  as  our  defining  property  of  a 
unitary  matrix,  though  it  might  not  have  the  same  utility  as  Definition  UM. 


Theorem  UMCOB  Unitary  Matrices  Convert  Orthonormal  Bases 

Let  A be  an  n x n matrix  and  B = {xi,  X2,  x3,  . . . , x„}  be  an  orthonormal  basis  of 

Cn.  Define 

C = {Axi,  Ax2,  Ax3,  . . . , Axn} 

Then  A is  a unitary  matrix  if  and  only  if  C is  an  orthonormal  basis  of  Cn . 


Proof.  (=>)  Assume  A is  a unitary  matrix  and  establish  several  facts  about  C.  First 
we  check  that  C is  an  orthonormal  set  (Definition  ONS).  By  Theorem  UMPIP,  for 
i 7 1 3, 

{ Axi , Axj)  = (xi;  Xj)  =0 

Similarly,  Theorem  UMPIP  also  gives,  for  1 < i < n, 

II^Qll  = ||xi||  = 1 

As  C is  an  orthogonal  set  (Definition  OSV),  Theorem  OSLI  yields  the  linear 
independence  of  C.  Having  established  that  the  column  vectors  on  C form  a linearly 
independent  set,  a matrix  whose  columns  are  the  vectors  of  C is  nonsingular  (Theorem 
NMLIC),  and  hence  these  vectors  form  a basis  of  Cn  by  Theorem  CNMB. 

(<=)  Now  assume  that  C is  an  orthonormal  set.  Let  y be  an  arbitrary  vector 
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from  Cn.  Since  B spans  Cn,  there  are  scalars,  a\,  a 2,  03,  . . . , an,  such  that 
y = aixi  + a2x2  + a3x3  H b a„xn 

Now 


»= 1 3= 1 

n n 

EE 

z=i j=i 
n n 

EE 

»= 1 t=i 

n n 


A*  Ay)  x* 

Theorem  COB 

i,  a*a'Y] ajXj)xi 

Dehnition  SSVS 

3 = 1 / 

Tl  \ 

i,  V A* Aa:jX7  \ X, 

Theorem  MMDAA 

3=1  / 

Tl  \ 

i ; ^ ' C j A Axj  ) Xj 

Theorem  MMSMM 

3= 1 / 

(Xj,  ajA*Axj)x i 

Theorem  IPVA 

dj  (Xj,  A* Axj)  Xj 

Theorem  IPSM 

aj  (Ax j,  Axj)  Xj 

Theorem  AIP 

= EE  a,  (Axj,  Axj)  Xj  + ^ ar  (Axf,  Ax^)  x^  Property  C 

i=l  3- 1 ^=1 

n n n 

= EE  flj  (0)Xj  + N ( ( 1 jXf 


<=1  J=1 

j¥* 


t=i 


n n n 

= EE°+E 

i=l  j=l  <=1 
j¥=i 

n 

= ^ a^x^ 
f=i 

= y 

= -fny 


Dehnition  ONS 


Theorem  ZSSM 


Property  Z 


Theorem  MMIM 


Since  the  choice  of  y was  arbitrary,  Theorem  EMMVP  tells  us  that  A*  A = In, 
so  A is  unitary  (Definition  UM).  ■ 
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Reading  Questions 


1.  The  matrix  below  is  nonsingular.  What  can  you  now  say  about  its  columns? 


A = 


-3  0 
1 2 
5 1 


1 

1 

6 


2.  Write  the  vector  w = 


6 

6 

15 


as  a linear  combination  of  the  columns  of  the  matrix  A 


above.  How  many  ways  are  there  to  answer  this  question? 


3.  Why  is  an  orthonormal  basis  desirable? 


Exercises 

CIO'  Find  a basis  for  (S),  where 


f 

T 

T 

T 

T 

'3' 

1 

3 

2 

1 

2 

4 

2 

) 

1 

5 

0 

5 

2 

5 

1 

l 

1 

1 

1 

1 

3 

Cll'  Find  a basis  for  the  subspace  W of  C4, 


W = 


a + b — 2c 
a + b — 2c  + d 
—2a  + 2b  + 4c  — d 
b + d 


a,  b,  c,  d € C 


012^  Find  a basis  for  the  vector  space  T of  lower  triangular  3x3  matrices;  that  is, 

’*  0 o" 

where  an  asterisk  represents  any  complex  number. 


matrices  of  the  form 


* * 0 

* * * 


013^  Find  a basis  for  the  subspace  Q of  P2,  Q = {p{x)  = a + bx  + cx2\  p( 0)  = 0}. 

Cldl  Find  a basis  for  the  subspace  R of  P2,  R = { p(x)  = a + bx  + cx2  \ p'{0)  = 0} , where 
p 1 denotes  the  derivative. 

C40'  From  Example  RSB,  form  an  arbitrary  (and  nontrivial)  linear  combination  of  the 
four  vectors  in  the  original  spanning  set  for  W.  So  the  result  of  this  computation  is  of 
course  an  element  of  W . As  such,  this  vector  should  be  a linear  combination  of  the  basis 
vectors  in  B.  Find  the  (unique)  scalars  that  provide  this  linear  combination.  Repeat  with 
another  linear  combination  of  the  original  four  vectors. 

C80  Prove  that  {(1,  2),  (2,  3)}  is  a basis  for  the  crazy  vector  space  C (Example  CVS). 

M20'  In  Example  BM  provide  the  verifications  (linear  independence  and  spanning)  to 
show  that  B is  a basis  of  Mm„. 


T5(F  Theorem  UMCOB  says  that  unitary  matrices  are  characterized  as  those  matrices 
that  “carry”  orthonormal  bases  to  orthonormal  bases.  This  problem  asks  you  to  prove  a 
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similar  result:  nonsingular  matrices  are  characterized  as  those  matrices  that  “carry”  bases 
to  bases. 

More  precisely,  suppose  that  A is  a square  matrix  of  size  n and  B = {xi,  X2,  X3,  . . . , xn} 
is  a basis  of  C™.  Prove  that  A is  nonsingular  if  and  only  if  C = {Axi,  Ax2,  AX3,  . . . , Ax„} 
is  a basis  of  Cn.  (See  also  Exercise  PD.T33,  Exercise  MR.T20.) 

T511  Use  the  result  of  Exercise  B.T50  to  build  a very  concise  proof  of  Theorem  CNMB. 
(Hint:  make  a judicious  choice  for  the  basis  B.) 


Section  D 
Dimension 


Almost  every  vector  space  we  have  encountered  has  been  infinite  in  size  (an  exception 
is  Example  VSS).  But  some  are  bigger  and  richer  than  others.  Dimension,  once 
suitably  defined,  will  be  a measure  of  the  size  of  a vector  space,  and  a useful  tool 
for  studying  its  properties.  You  probably  already  have  a rough  notion  of  what  a 
mathematical  definition  of  dimension  might  be  — try  to  forget  these  imprecise  ideas 
and  go  with  the  new  ones  given  here. 

Subsection  D 
Dimension 

Definition  D Dimension 

Suppose  that  V is  a vector  space  and  {vi,  v2,  v3,  . . . , vt}  is  a basis  of  V.  Then  the 
dimension  of  V is  defined  by  dim  (V)  = t.  If  V has  no  finite  bases,  we  say  V has 
infinite  dimension.  □ 

This  is  a very  simple  definition,  which  belies  its  power.  Grab  a basis,  any  basis, 
and  count  up  the  number  of  vectors  it  contains.  That  is  the  dimension.  However,  this 
simplicity  causes  a problem.  Given  a vector  space,  you  and  I could  each  construct 
different  bases  — remember  that  a vector  space  might  have  many  bases.  And  what  if 
your  basis  and  my  basis  had  different  sizes?  Applying  Definition  D we  would  arrive 
at  different  numbers!  With  our  current  knowledge  about  vector  spaces,  we  would 
have  to  say  that  dimension  is  not  “well-defined.”  Fortunately,  there  is  a theorem 
that  will  correct  this  problem. 

In  a strictly  logical  progression,  the  next  two  theorems  would  precede  the  definition 
of  dimension.  Many  subsequent  theorems  will  trace  their  lineage  back  to  the  following 
fundamental  result. 

Theorem  SSLD  Spanning  Sets  and  Linear  Dependence 

Suppose  that  S = {vi,  v2,  v3,  . . . , vt}  is  a finite  set  of  vectors  which  spans  the 
vector  space  V . Then  any  set  oft  + 1 or  more  vectors  from  V is  linearly  dependent. 

Proof.  We  want  to  prove  that  any  set  of  t + 1 or  more  vectors  from  V is  linearly 
dependent.  So  we  will  begin  with  a totally  arbitrary  set  of  vectors  from  V,  R = 
{ui,  u2,  u3,  . . . , um},  where  m > t.  We  will  now  construct  a nontrivial  relation  of 
linear  dependence  on  R. 

Each  vector  u3,  u2,  u3,  . . . , uTO  can  be  written  as  a linear  combination  of  the 
vectors  vi,  v2,  v3,  . . . , vt  since  S'  is  a spanning  set  of  V.  This  means  there  exist 
scalars  a^-,  1 < i < t,  1 < j < to,  so  that 

ui  = anvi  + a2iv2  + a3iv3  H b ativt 
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u2  = ai2vi  + a22v2  + a32v3  H b at2v( 

u3  = ai3V!  + a23v2  + a33v3  H b at3v( 


um  = aimvi  + a2mv2  + a3mv3  H b atmvt 

Now  we  form,  unmotivated,  the  homogeneous  system  of  t equations  in  the  m 
variables,  X\,  £2,  X3,  . . . , xm,  where  the  coefficients  are  the  just-discovered  scalars 

0*1  j 1 

oi\X\  + a 12X2  + ai3ir3  + • • • + a\mxm  = 0 

O 21X1  + 022*2  + 023*3  + ‘ ‘ ‘ + 02mXm  = 0 

031*1  + 0,32X2  + 0,33X3  + • • • + a3mxm  = 0 

0(1*1  + 0(2*2  + 0(3X3  -1 b atmXm  = 0 

This  is  a homogeneous  system  with  more  variables  than  equations  (our  hypothesis 
is  expressed  as  m > t),  so  by  Theorem  HMVEI  there  are  infinitely  many  solutions. 
Choose  a nontrivial  solution  and  denote  it  by  X\  = ci,  *2  = C2,  *3  = c3,  . . . , xm  = 
Cm.  As  a solution  to  the  homogeneous  system,  we  then  have 

OnCi  + Oi2C2  + Oi3C3  + • • • + Oi  mCm  = 0 

O21C1  + 022C2  + 023C3  + • • • + 02  mCm  = 0 

a3iCi  + a32C2  + 033C3  + ■ ■ ■ + a3mcm  = 0 

0(1  Cl  + at2C2  + 0(3c3  H b atmCm  = o 

As  a collection  of  nontrivial  scalars,  ci,  C2,  c3,  . . . , cm  will  provide  the  nontrivial 
relation  of  linear  dependence  we  desire, 

ClUi  + C2  u2  + c3u3  + • • • + Cmllm 

= Ci  (onVi  + a2iv2  + a3iv3  H — • + atiV()  Definition  SSVS 

+ c2  (a12Vi  + a22v2  + a32v3  H b at2V() 

+ c3  (a13Vi  + a23v2  + a33v3  H b at3V() 


+ cm  (aimvi  + a2mv2  + a3mv3  H b atmV() 

= Ci  On  v l + cia2iv2  -b  cia3iv3  H b ciatiV( 


Property  DVA 
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A C2CI12V1  + C2d22V2  + C2a32V3  + • • • + C2dt2Vt 
+ c3ai3V!  + C3CI23V2  + c3a33v3  H b c3at3vt 


A CmaimVi  A Cmfl2mV2  A Cma3mV3  A ' * * A CmdtmNt 

= (cian  A C2ai2  A c3ai3  b cmaim)  v3  Property  DSA 

A (cia2i  A c2a22  A c3a23  H b cma2m)  v2 

A (cia3i  A c2a32  A c3a33  A • • • A cma3m)  v3 

A (ciati  A c2at2  A c3at3  A • • • A cmatm ) 

= (anCi  A ai2c2  A ai3c3  d A aimcm)  v3  Property  CMCN 

A ((I21C1  A a22c2  A a23c3  A • • • A ffl2mcm)  v2 
A (a3iCi  A a32c2  A a33c3  H b a3mcm)  v3 


A (atiCi  A a*2C2  A at3c3  A • • • A atmcm) 

= Ovi  + 0v2  A 0v3  H b 0vt 

= 0A0A0A  - AO 

= 0 


Cj  as  solution 
Theorem  ZSSM 
Property  Z 


That  does  it.  R has  been  undeniably  shown  to  be  a linearly  dependent  set. 

The  proof  just  given  has  some  monstrous  expressions  in  it,  mostly  owing  to 
the  double  subscripts  present.  Now  is  a great  opportunity  to  show  the  value  of  a 
more  compact  notation.  We  will  rewrite  the  key  steps  of  the  previous  proof  using 
summation  notation,  resulting  in  a more  economical  presentation,  and  even  greater 
insight  into  the  key  aspects  of  the  proof.  So  here  is  an  alternate  proof  — study  it 
carefully. 

Alternate  Proof:  We  want  to  prove  that  any  set  of  t A 1 or  more  vectors  from 
V is  linearly  dependent.  So  we  will  begin  with  a totally  arbitrary  set  of  vectors  from 
V,  R = { Ujj  1 < j < m},  where  to  > t.  We  will  now  construct  a nontrivial  relation 
of  linear  dependence  on  R. 

Each  vector  Uj,  1 < j < m can  be  written  as  a linear  combination  of  Vj,  1 < i < t 
since  S'  is  a spanning  set  of  V.  This  means  there  are  scalars  anj,  1 < i <t.  1 < j < to, 
so  that 

t 

Uj  = CSjjVj  1 < j < to 

i—l 

Now  we  form,  unmotivated,  the  homogeneous  system  of  t equations  in  the  to 
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variables,  Xj,  1 < j < m,  where  the  coefficients  are  the  just-discovered  scalars  a. 


ij, 


a,ijXj  =0  1 < i < t 

3= 1 

This  is  a homogeneous  system  with  more  variables  than  equations  (our  hypothesis 
is  expressed  as  m > t),  so  by  Theorem  HMVEI  there  are  infinitely  many  solutions. 
Choose  one  of  these  solutions  that  is  not  trivial  and  denote  it  by  Xj  = Cj,  1 < j < m. 
As  a solution  to  the  homogeneous  system,  we  then  have  aijcj  = 0 for  1 < i < t. 

As  a collection  of  nontrivial  scalars,  Cj,  1 < j < m,  will  provide  the  nontrivial  relation 
of  linear  dependence  we  desire, 


3=  i 


m ft  \ 

cj  [ abv* ) 

3=1  \i= 1 / 

Definition  SSVS 

m t 

cjaijvi 

j= i i=i 

Property  DVA 

t m 

X!  X!  ciab  v* 

i= 1 i=l 

Property  C 

t m 

yy  x.  Q-ijCjVi 

*= 1 i=i 

Property  CMCN 

t / m \ 

X.  ( X.  aijC3  I 

Property  DSA 

Xov, 

*=1 

Cj  as  solution 

t 

Xo 

-1  — 1 

Theorem  ZSSM 

fc — 1 

0 

Property  Z 

That  does  it.  R has  been  undeniably  shown  to  be  a linearly  dependent  set.  ■ 

Notice  how  the  swap  of  the  two  summations  is  so  much  easier  in  the  third  step 
above,  as  opposed  to  all  the  rearranging  and  regrouping  that  takes  place  in  the 
previous  proof.  And  using  only  about  half  the  space.  And  there  are  no  ellipses  (■■■)• 
Theorem  SSLD  can  be  viewed  as  a generalization  of  Theorem  MVSLD.  We  know 
that  Cm  has  a basis  with  m vectors  in  it  (Theorem  SUVB),  so  it  is  a set  of  m vectors 
that  spans  Cm.  By  Theorem  SSLD,  any  set  of  more  than  m vectors  from  Cm  will  be 
linearly  dependent.  But  this  is  exactly  the  conclusion  we  have  in  Theorem  MVSLD. 
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Maybe  this  is  not  a total  shock,  as  the  proofs  of  both  theorems  rely  heavily  on 
Theorem  HMVEI.  The  beauty  of  Theorem  SSLD  is  that  it  applies  in  any  vector 
space.  We  illustrate  the  generality  of  this  theorem,  and  hint  at  its  power,  in  the  next 
example. 

Example  LDP4  Linearly  dependent  set  in  P4 
In  Example  SSP4  we  showed  that 

S = {x  - 2,  x2  - 4x  + 4,  a;3  - 6x2  + 12x  - 8,  x4  - 8x3  + 24x2  - 32x  + 16} 

is  a spanning  set  for  W = {p(x)\p  G P4,  p( 2)  =0}.  So  we  can  apply  Theorem  SSLD 
to  W with  t = 4.  Here  is  a set  of  five  vectors  from  W,  as  you  may  check  by  verifying 
that  each  is  a polynomial  of  degree  4 or  less  and  has  x = 2 as  a root, 

T = {pi,  P2,  P3,  P4,  P5}  Q W 


Pi  = x4  — 2x3  + 2x2  — 8x  + 8 
P2  = — x 3 + 6a;2  — 5a;  — 6 
P3  = 2a;4  — 5a;3  + 5a;2  — 7x  + 2 
P4  = —a;4  + 4a;3  — 7x2  + 6x 
P5  = 4x3  — 9a;2  + 5a;  — 6 

By  Theorem  SSLD  we  conclude  that  T is  linearly  dependent,  with  no  further 
computations.  A 

Theorem  SSLD  is  indeed  powerful,  but  our  main  purpose  in  proving  it  right  now 
was  to  make  sure  that  our  definition  of  dimension  (Definition  D)  is  well-defined. 
Here  is  the  theorem. 

Theorem  BIS  Bases  have  Identical  Sizes 

Suppose  that  V is  a vector  space  with  a finite  basis  B and  a second  basis  C . Then  B 
and  C have  the  same  size. 

Proof.  Suppose  that  C has  more  vectors  than  B.  (Allowing  for  the  possibility  that 
C is  infinite,  we  can  replace  C by  a subset  that  has  more  vectors  than  B .)  As  a 
basis,  B is  a spanning  set  for  V (Definition  B),  so  Theorem  SSLD  says  that  C is 
linearly  dependent.  However,  this  contradicts  the  fact  that  as  a basis  C is  linearly 
independent  (Definition  B).  So  C must  also  be  a finite  set,  with  size  less  than,  or 
equal  to,  that  of  B. 

Suppose  that  B has  more  vectors  than  C.  As  a basis,  C is  a spanning  set  for  V 
(Definition  B),  so  Theorem  SSLD  says  that  B is  linearly  dependent.  However,  this 
contradicts  the  fact  that  as  a basis  B is  linearly  independent  (Definition  B).  So  C 
cannot  be  strictly  smaller  than  B. 

The  only  possibility  left  for  the  sizes  of  B and  C is  for  them  to  be  equal.  ■ 
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Theorem  BIS  tells  us  that  if  we  find  one  finite  basis  in  a vector  space,  then  they 
all  have  the  same  size.  This  (finally)  makes  Definition  D unambiguous. 

Subsection  DVS 
Dimension  of  Vector  Spaces 

We  can  now  collect  the  dimension  of  some  common,  and  not  so  common,  vector 
spaces. 

Theorem  DCM  Dimension  of  Cm 

The  dimension  of  Cm  (Example  VSCV)  is  m. 

Proof.  Theorem  SUVB  provides  a basis  with  m vectors.  ■ 

Theorem  DP  Dimension  of  Pn 

The  dimension  of  Pn  (Example  VSP)  is  n + 1. 

Proof.  Example  BP  provides  two  bases  with  n + 1 vectors.  Take  your  pick.  ■ 

Theorem  DM  Dimension  of  Mmn 

The  dimension  of  Mmn  (Example  VSM)  is  mn. 

Proof.  Example  BM  provides  a basis  with  mn  vectors.  ■ 

Example  DSM22  Dimension  of  a subspace  of  M22 
It  should  now  be  plausible  that 

2a  + b + 3c  + Ad  = 0,  — a + 36  — 5c  — 2d  = 0 j 

is  a subspace  of  the  vector  space  M2 2 (Example  VSM).  (It  is.)  To  find  the  dimension 
of  Z we  must  first  find  a basis,  though  any  old  basis  will  do. 

First  concentrate  on  the  conditions  relating  a,  6,  c and  d.  They  form  a homoge- 
neous system  of  two  equations  in  four  variables  with  coefficient  matrix 

'213  4 ' 

-1  3 -5  -2 

We  can  row-reduce  this  matrix  to  obtain 

'(7]  0 22' 

.0  0-io. 

Rewrite  the  two  equations  represented  by  each  row  of  this  matrix,  expressing  the 
dependent  variables  ( a and  b)  in  terms  of  the  free  variables  (c  and  d),  and  we  obtain, 


a = —2c  — 2d 
b = c 


§D 


Beezer:  A First  Course  in  Linear  Algebra 


324 


We  can  now  write  a typical  entry  of  Z strictly  in  terms  of  c and  d,  and  we  can 
decompose  the  result, 


a b 

—2c  — 2d  c 

—2c  c 

—2d 

o' 

-2  1 

-2 

o' 

c d 

— 

c d 

— 

c 0 

+ 

0 

d 

= c 

1 0 

+ d 

0 

1 

This  equation  says  that  an  arbitrary  matrix  in  Z can  be  written  as  a linear 
combination  of  the  two  vectors  in 


so  we  know  that 


S = 


Z=(S)  = 


-2 

1 


-2 

0 


-2 

1 


-2 

0 


Are  these  two  matrices  (vectors)  also  linearly  independent?  Begin  with  a relation 
of  linear  dependence  on  S, 


a i 


-2 

1 


CL2 


-2 

0 


= O 


— 2 cl\  — 2a2  cl\ 

'0 

o' 

Cll  CL2 

0 

0 

From  the  equality  of  the  two  entries  in  the  last  row,  we  conclude  that  a\  = 0, 
a2  = 0.  Thus  the  only  possible  relation  of  linear  dependence  is  the  trivial  one,  and 
therefore  S is  linearly  independent  (Definition  LI).  So  S'  is  a basis  for  Z (Definition  B). 
Finally,  we  can  conclude  that  dim  (Z)  = 2 (Definition  D)  since  S has  two  elements. 
A 


Example  DSP4  Dimension  of  a subspace  of  P4 
In  Example  BSP4  we  showed  that 

S'  = {x  — 2,  x2  — 4x  + 4,  x3  — 6x2  + 12x  — 8,  x4  — 8a:3  + 24x2  — 32a;  + 16} 

is  a basis  for  W = {p{x)\p  € P4,  p( 2)  = 0}.  Thus,  the  dimension  of  W is  four, 
dim  (W)  = 4. 

Note  that  dim  (P4)  = 5 by  Theorem  DP,  so  IE  is  a subspace  of  dimension  4 
within  the  vector  space  P4  of  dimension  5,  illustrating  the  upcoming  Theorem  PSSD. 
A 


Example  DC  Dimension  of  the  crazy  vector  space 

In  Example  BC  we  determined  that  the  set  R = {(1,  0),  (6,  3)}  from  the  crazy 
vector  space,  C (Example  CVS),  is  a basis  for  C.  By  Definition  D we  see  that  C has 
dimension  2,  dim(C)  =2.  A 

It  is  possible  for  a vector  space  to  have  no  finite  bases,  in  which  case  we  say 
it  has  infinite  dimension.  Many  of  the  best  examples  of  this  are  vector  spaces  of 
functions,  which  lead  to  constructions  like  Hilbert  spaces.  We  will  focus  exclusively 
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on  finite-dimensional  vector  spaces.  OK,  one  infinite-dimensional  example,  and  then 
we  will  focus  exclusively  on  finite-dimensional  vector  spaces. 

Example  VSPUD  Vector  space  of  polynomials  with  unbounded  degree 
Define  the  set  P by 

P = {p | p(x)  is  a polynomial  in  x} 

Our  operations  will  be  the  same  as  those  defined  for  Pn  (Example  VSP). 

With  no  restrictions  on  the  possible  degrees  of  our  polynomials,  any  finite  set 
that  is  a candidate  for  spanning  P will  come  up  short.  We  will  give  a proof  by 
contradiction  (Proof  Technique  CD).  To  this  end,  suppose  that  the  dimension  of  P 
is  finite,  say  dim  (P)  = n. 

The  set  T = {l,  x,  x2,  . . . , xn } is  a linearly  independent  set  (check  this!)  con- 
taining n + 1 polynomials  from  P.  However,  a basis  of  P will  be  a spanning  set  of 
P containing  n vectors.  This  situation  is  a contradiction  of  Theorem  SSLD,  so  our 
assumption  that  P has  finite  dimension  is  false.  Thus,  we  say  dim  (P)  = oo.  A 

Subsection  RNM 

Rank  and  Nullity  of  a Matrix 

For  any  matrix,  we  have  seen  that  we  can  associate  several  subspaces  — the  null 
space  (Theorem  NSMS),  the  column  space  (Theorem  CSMS),  row  space  (Theorem 
RSMS)  and  the  left  null  space  (Theorem  LNSMS).  As  vector  spaces,  each  of  these 
has  a dimension,  and  for  the  null  space  and  column  space,  they  are  important  enough 
to  warrant  names. 

Definition  NOM  Nullity  Of  a Matrix 

Suppose  that  A is  an  m x n matrix.  Then  the  nullity  of  A is  the  dimension  of  the 
null  space  of  A,  n {A)  = dim  (Af  (A)).  □ 

Definition  ROM  Rank  Of  a Matrix 

Suppose  that  A is  an  m x n matrix.  Then  the  rank  of  A is  the  dimension  of  the 
column  space  of  A,  r (A)  = dim  (C(A)).  □ 

Example  RNM  Rank  and  nullity  of  a matrix 
Let  us  compute  the  rank  and  nullity  of 


' 2 

—4 

-1 

3 

2 

1 

-4' 

1 

-2 

0 

0 

4 

0 

1 

-2 

4 

1 

0 

-5 

—4 

-8 

1 

-2 

1 

1 

6 

1 

-3 

2 

-4 

-1 

1 

4 

- 2 

-1 

-1 

2 

3 

-1 

6 

3 

-1 

To  do  this,  we  will  first  row-reduce  the  matrix  since  that  will  help  us  determine 
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bases  for  the  null  space  and  column  space. 

IT]  -2  0 0 4 0 1 " 

0 0 LiJ  0 3 0 -2 

0 0 0 0-10-3 

0 0 0 0 0 0 1 

0 0 0 0 0 0 0 

. 0 0 0 0 0 0 0 . 

From  this  row-equivalent  matrix  in  reduced  row-echelon  form  we  record  D = 
{1,  3,  4,  6}  and  F = {2,  5,  7}. 

For  each  index  in  D,  Theorem  BCS  creates  a single  basis  vector.  In  total  the 
basis  will  have  4 vectors,  so  the  column  space  of  A will  have  dimension  4 and  we 
write  r (A)  = 4. 

For  each  index  in  F,  Theorem  BNS  creates  a single  basis  vector.  In  total  the 
basis  will  have  3 vectors,  so  the  null  space  of  A will  have  dimension  3 and  we  write 
n(A)=  3.  A 

There  were  no  accidents  or  coincidences  in  the  previous  example  — with  the 
row-reduced  version  of  a matrix  in  hand,  the  rank  and  nullity  are  easy  to  compute. 

Theorem  CRN  Computing  Rank  and  Nullity 

Suppose  that  A is  an  m x n matrix  and  B is  a row-equivalent  matrix  in  reduced 
row-echelon  form.  Let  r denote  the  number  of  pivot  columns  (or  the  number  of 
nonzero  rows).  Then  r (A)  = r and  n (A)  = n — r. 

Proof.  Theorem  BCS  provides  a basis  for  the  column  space  by  choosing  columns 
of  A that  that  have  the  same  indices  as  the  pivot  columns  of  B.  In  the  analysis  of 
B,  each  leading  1 provides  one  nonzero  row  and  one  pivot  column.  So  there  are  r 
column  vectors  in  a basis  for  C(A). 

Theorem  BNS  provides  a basis  for  the  null  space  by  creating  basis  vectors  of  the 
null  space  of  A from  entries  of  B , one  basis  vector  for  each  column  that  is  not  a 
pivot  column.  So  there  are  n — r column  vectors  in  a basis  for  n (A) . ■ 

Every  archetype  (Archetypes)  that  involves  a matrix  lists  its  rank  and  nullity. 
You  may  have  noticed  as  you  studied  the  archetypes  that  the  larger  the  column  space 
is  the  smaller  the  null  space  is.  A simple  corollary  states  this  trade-off  succinctly. 
(See  Proof  Technique  LC.) 

Theorem  RPNC  Rank  Plus  Nullity  is  Columns 
Suppose  that  A is  an  m x n matrix.  Then  r (A)  + n (A)  = n. 

Proof.  Let  r be  the  number  of  nonzero  rows  in  a row-equivalent  matrix  in  reduced 
row-echelon  form.  By  Theorem  CRN, 

r (A)  + n (A)  = r + (n  — r)  = n 
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When  we  first  introduced  r as  our  standard  notation  for  the  number  of  nonzero 
rows  in  a matrix  in  reduced  row-echelon  form  you  might  have  thought  r stood  for 
“rows.”  Not  really  — it  stands  for  “rank” ! 

Subsection  RNNM 

Rank  and  Nullity  of  a Nonsingular  Matrix 


Let  us  take  a look  at  the  rank  and  nullity  of  a square  matrix. 


Example  RNSM 

Rank  and  nullity  of  a 

square  matrix 

The  matrix 

' 0 

4 

-1 

2 

2 

3 1 

2 

—2 

1 

-1 

0 

-A  —3 

-2 

-3 

9 

-3 

9 

-1  9 

E = 

-3 

-A 

9 

4 

-1 

6 -2 

-3 

-A 

6 

-2 

5 

9 -A 

9 

-3 

8 

-2 

-A 

2 A 

8 

2 

2 

9 

3 

0 9 

is  row-equivalent  to 

the  matrix 

in  reduced 

row- 

echelon  form, 

0 o o o o o o' 

0 0 0 0 0 0 0 

0 0 0 0 0 0 0 

0 0 0 0 0 0 0 

0 0 0 0 0 0 0 

0 0 0 0 0 0 0 

.0  0 0 0 0 0 0 

With  n = 7 columns  and  r = 7 nonzero  rows  Theorem  CRN  tells  us  the  rank  is 
r (E)  = 7 and  the  nullity  is  n ( E ) =7—7  = 0.  A 

The  value  of  either  the  nullity  or  the  rank  are  enough  to  characterize  a nonsingular 
matrix. 

Theorem  RNNM  Rank  and  Nullity  of  a Nonsingular  Matrix 
Suppose  that  A is  a square  matrix  of  size  n.  The  following  are  equivalent. 

1.  A is  nonsingular. 

2.  The  rank  of  A is  n,  r (A)  = n. 

3.  The  nullity  of  A is  zero,  n(A)  = 0. 

Proof.  (1  =>  2)  Theorem  CSNM  says  that  if  A is  nonsingular  then  C(A)  = Cn.  If 
C(A)  = Cn,  then  the  column  space  has  dimension  n by  Theorem  DCM,  so  the  rank 
of  A is  n. 
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(2  =>  3)  Suppose  r (A)  = n.  Then  Theorem  RPNC  gives 

n (A)  = n — r (A)  Theorem  RPNC 

= n — n Hypothesis 

= 0 

(3  =>  1)  Suppose  n (A)  = 0,  so  a basis  for  the  null  space  of  A is  the  empty  set. 
This  implies  that  J\f(A)  = {0}  and  Theorem  NMTNS  says  A is  nonsingular.  ■ 

With  a new  equivalence  for  a nonsingular  matrix,  we  can  update  our  list  of 
equivalences  (Theorem  NME5)  which  now  becomes  a list  requiring  double  digits  to 
number. 

Theorem  NME6  Nonsingular  Matrix  Equivalences,  Round  6 
Suppose  that  A is  a square  matrix  of  size  n.  The  following  are  equivalent. 

1.  A is  nonsingular. 

2.  A row-reduces  to  the  identity  matrix. 

3.  The  null  space  of  A contains  only  the  zero  vector,  Af(A)  = {0}. 

f.  The  linear  system  CS(A,  b)  has  a unique  solution  for  every  possible  choice  of 

b. 

5.  The  columns  of  A are  a linearly  independent  set. 

6.  A is  invertible. 

7.  The  column  space  of  A is  Cn,  C(A)  = Cn. 

8.  The  columns  of  A are  a basis  for  C" . 

9.  The  rank  of  A is  n,  r (A)  = n. 

10.  The  nullity  of  A is  zero,  n (A)  = 0. 

Proof.  Building  on  Theorem  NME5  we  can  add  two  of  the  statements  from  Theorem 
RNNM.  ■ 

Reading  Questions 

1.  What  is  the  dimension  of  the  vector  space  P§,  the  set  of  all  polynomials  of  degree  6 or 
less? 

2.  How  are  the  rank  and  nullity  of  a matrix  related? 

3.  Explain  why  we  might  say  that  a nonsingular  matrix  has  “full  rank.” 
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Exercises 

C20  The  archetypes  listed  below  are  matrices,  or  systems  of  equations  with  coefficient 
matrices.  For  each,  compute  the  nullity  and  rank  of  the  matrix.  This  information  is  listed 
for  each  archetype  (along  with  the  number  of  columns  in  the  matrix,  so  as  to  illustrate 
Theorem  RPNC),  and  notice  how  it  could  have  been  computed  immediately  after  the 
determination  of  the  sets  D and  F associated  with  the  reduced  row-echelon  form  of  the 
matrix. 

Archetype  A,  Archetype  B,  Archetype  C,  Archetype  D/Archetype  E,  Archetype  F,  Archetype 
G/Archetype  H,  Archetype  I,  Archetype  J,  Archetype  K,  Archetype  L 

C21'  Find  the  dimension  of  the  subspace  W = 


a + 6 
a + c 
a + d 
d 


a,  b,  c,  d € C } of  C4 


C22f 

of  P3- 
C23f 
of  M22 

C30f 


Find  the  dimension  of  the  subspace  W = { a + bx  + cx 2 + dx3 1 a + fo  + c-|-d  = 0} 


Find  the  dimension  of  the  subspace  W = 


a b 
c d 


a + b — c,  b + c = d,c  + d = a 


For  the  matrix  A below,  compute  the  dimension  of  the  null  space  of  A,  dim  (Af(A)). 


'2 

-1 

-3 

11 

9 ‘ 

A = 

1 

2 

1 

-7 

-3 

3 

1 

-3 

6 

8 

2 

1 

2 

-5 

-3 

C311  The  set  W below  is  a subspace  of  C4.  Find  the  dimension  of  W. 


W = 


C35^  Find  the  rank  and  nullity  of  the  matrix  A 


‘ 1 Of 
1 2 2 

2 1 1 

-1  0 1 

1 1 2 


C36f 


C37f 


Find  the  rank  and  nullity  of  the  matrix  A = 


Find  the  rank  and  nullity  of  the  matrix  A = 


12  111 
1 3 2 0 4. 

12  111 

‘3  2 11  1 ' 

2 3 0 1 1 

-1121  0 
110  11 
0 112-1 


C40  In  Example  LDP4  we  determined  that  the  set  of  five  polynomials,  T,  is  linearly 
dependent  by  a simple  invocation  of  Theorem  SSLD.  Prove  that  T is  linearly  dependent 
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from  scratch,  beginning  with  Definition  LI. 

M20f  M22  is  the  vector  space  of  2 x 2 matrices.  Let  S22  denote  the  set  of  all  2 x 2 
symmetric  matrices.  That  is 

S22  — { A £ A/22 1 A = A| 


1.  Show  that  S22  is  a subspace  of  A/22. 

2.  Exhibit  a basis  for  S22  and  prove  that  it  has  the  required  properties. 

3.  What  is  the  dimension  of  S22? 

M21'  A 2 x 2 matrix  B is  upper  triangular  if  [B]21  = 0.  Let  UT2  be  the  set  of  all  2 x 2 
upper  triangular  matrices.  Then  UT2  is  a subspace  of  the  vector  space  of  all  2 x 2 matrices, 
A/22  (you  may  assume  this).  Determine  the  dimension  of  UT2  providing  all  of  the  necessary 
justifications  for  your  answer. 


Section  PD 

Properties  of  Dimension 

Once  the  dimension  of  a vector  space  is  known,  then  the  determination  of  whether 
or  not  a set  of  vectors  is  linearly  independent,  or  if  it  spans  the  vector  space,  can 
often  be  much  easier.  In  this  section  we  will  state  a workhorse  theorem  and  then 
apply  it  to  the  column  space  and  row  space  of  a matrix.  It  will  also  help  us  describe 
a super- basis  for  Cm. 

Subsection  GT 
Goldilocks’  Theorem 

We  begin  with  a useful  theorem  that  we  will  need  later,  and  in  the  proof  of  the 
main  theorem  in  this  subsection.  This  theorem  says  that  we  can  extend  linearly 
independent  sets,  one  vector  at  a time,  by  adding  vectors  from  outside  the  span  of 
the  linearly  independent  set,  all  the  while  preserving  the  linear  independence  of  the 
set. 

Theorem  ELIS  Extending  Linearly  Independent  Sets 

Suppose  V is  a vector  space  and  S is  a linearly  independent  set  of  vectors  from  V. 
Suppose  w is  a vector  such  that  w ^ (S).  Then  the  set  S'  = S U {w}  is  linearly 
independent. 

Proof.  Suppose  S = {vi,  V2,  V3,  . . . , vm}  and  begin  with  a relation  of  linear  depen- 
dence on  S', 

aiVi  + a2v2  + a3v3  H b amvm  + «m+iw  = 0. 

There  are  two  cases  to  consider.  First  suppose  that  am+\  = 0.  Then  the  relation 
of  linear  dependence  on  S'  becomes 

aiVi  + a2v2  + a3v3  -I b amvm  = 0. 

and  by  the  linear  independence  of  the  set  S,  we  conclude  that  aq  = a2  = a3  = • • • = 
am  = 0.  So  all  of  the  scalars  in  the  relation  of  linear  dependence  on  S'  are  zero. 

In  the  second  case,  suppose  that  am+i  7^  0.  Then  the  relation  of  linear  dependence 
on  S'  becomes 


«m+iw  = — CI1V1  - a2v2  - a3v3 amvm 

ai  a2  a3  am 

w = v3 v2 v3 vm 

&m+ 1 &m+ 1 &m+ 1 &ra+ 1 

This  equation  expresses  w as  a linear  combination  of  the  vectors  in  S,  contrary 
to  the  assumption  that  w ^ ( S ),  so  this  case  leads  to  a contradiction. 

The  first  case  yielded  only  a trivial  relation  of  linear  dependence  on  S'  and  the 
second  case  led  to  a contradiction.  So  S'  is  a linearly  independent  set  since  any 
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relation  of  linear  dependence  is  trivial.  ■ 

In  the  story  Goldilocks  and  the  Three  Bears , the  young  girl  Goldilocks  visits  the 
empty  house  of  the  three  bears  while  out  walking  in  the  woods.  One  bowl  of  porridge 
is  too  hot,  the  other  too  cold,  the  third  is  just  right.  One  chair  is  too  hard,  one  too 
soft,  the  third  is  just  right.  So  it  is  with  sets  of  vectors  — some  are  too  big  (linearly 
dependent),  some  are  too  small  (they  do  not  span),  and  some  are  just  right  (bases). 
Here  is  Goldilocks’  Theorem. 

Theorem  G Goldilocks 

Suppose  that  V is  a vector  space  of  dimension  t.  Let  S = {v1(  v2,  v3,  . . . , vm}  be  a 
set  of  vectors  from  V . Then 

1.  If  m > t,  then  S is  linearly  dependent. 

2.  If  m < t,  then  S does  not  span  V. 

3.  If  m = t and  S is  linearly  independent,  then  S spans  V. 

4 . If  m = t and  S spans  V,  then  S is  linearly  independent. 

Proof.  Let  B be  a basis  of  V.  Since  dim  (V)  = t , Definition  B and  Theorem  BIS 
imply  that  I?  is  a linearly  independent  set  of  t vectors  that  spans  V. 

1.  Suppose  to  the  contrary  that  S is  linearly  independent.  Then  B is  a smaller 
set  of  vectors  that  spans  V.  This  contradicts  Theorem  SSLD. 

2.  Suppose  to  the  contrary  that  S does  span  V.  Then  B is  a larger  set  of  vectors 
that  is  linearly  independent.  This  contradicts  Theorem  SSLD. 

3.  Suppose  to  the  contrary  that  S does  not  span  V.  Then  we  can  choose  a vector 
w such  that  w £ V and  w £ ( S ).  By  Theorem  ELIS,  the  set  S'  = S U {w}  is 
again  linearly  independent.  Then  S'  is  a set  ofm+l  = i + l vectors  that  are 
linearly  independent,  while  B is  a set  of  t vectors  that  span  V.  This  contradicts 
Theorem  SSLD. 

4.  Suppose  to  the  contrary  that  S is  linearly  dependent.  Then  by  Theorem  DLDS 
(which  can  be  upgraded,  with  no  changes  in  the  proof,  to  the  setting  of  a 
general  vector  space),  there  is  a vector  in  S,  say  vj,  that  is  equal  to  a linear 
combination  of  the  other  vectors  in  S.  Let  S'  = S \ {v*,},  the  set  of  “other” 
vectors  in  S.  Then  it  is  easy  to  show  that  V = (S)  = (S').  So  S'  is  a set  of 
to  — 1 = t — 1 vectors  that  spans  V,  while  B is  a set  of  t linearly  independent 
vectors  in  V.  This  contradicts  Theorem  SSLD. 
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There  is  a tension  in  the  construction  of  a basis.  Make  a set  too  big  and  you  will 
end  up  with  relations  of  linear  dependence  among  the  vectors.  Make  a set  too  small 
and  you  will  not  have  enough  raw  material  to  span  the  entire  vector  space.  Make  a 
set  just  the  right  size  (the  dimension)  and  you  only  need  to  have  linear  independence 
or  spanning,  and  you  get  the  other  property  for  free.  These  roughly-stated  ideas  are 
made  precise  by  Theorem  G. 

The  structure  and  proof  of  this  theorem  also  deserve  comment.  The  hypotheses 
seem  innocuous.  We  presume  we  know  the  dimension  of  the  vector  space  in  hand, 
then  we  mostly  just  look  at  the  size  of  the  set  S.  From  this  we  get  big  conclusions 
about  spanning  and  linear  independence.  Each  of  the  four  proofs  relies  on  ultimately 
contradicting  Theorem  SSLD,  so  in  a way  we  could  think  of  this  entire  theorem  as  a 
corollary  of  Theorem  SSLD.  (See  Proof  Technique  LC.)  The  proofs  of  the  third  and 
fourth  parts  parallel  each  other  in  style:  introduce  w using  Theorem  ELIS  or  toss 
Vfc  using  Theorem  DLDS.  Then  obtain  a contradiction  to  Theorem  SSLD. 

Theorem  G is  useful  in  both  concrete  examples  and  as  a tool  in  other  proofs.  We 
will  use  it  often  to  bypass  verifying  linear  independence  or  spanning. 


Example  BPR  Bases  for  Pn,  reprised 
In  Example  BP  we  claimed  that 

B = {l,  x,  x2,  x3,  . . . , xn} 

C = { 1 , 1 + x,  1 + x + x2,  1 + x + x2  + a;3,  . . . , 1 + x + x2  + x3  + ■ ■ ■ + xn } . 

were  both  bases  for  Pn  (Example  VSP).  Suppose  we  had  first  verified  that  B was 
a basis,  so  we  would  then  know  that  dim  (Pn)  = n + 1.  The  size  of  C is  n + 1,  the 
right  size  to  be  a basis.  We  could  then  verify  that  C is  linearly  independent.  We 
would  not  have  to  make  any  special  efforts  to  prove  that  C spans  Pni  since  Theorem 
G would  allow  us  to  conclude  this  property  of  C directly.  Then  we  would  be  able  to 
say  that  C is  a basis  of  Pn  also.  A 


Example  BDM22  Basis  by  dimension  in  M22 
In  Example  DSM22  we  showed  that 


B = 


-2  1 

1 0 


-2 

0 


0 

1 


is  a basis  for  the  subspace  Z of  M2 2 (Example  VSM)  given  by 


Z = 


a b 
c d 


2a  + b + 3c  + 4d  = 0,  — a + 3b  — 5c  — d = 


0 


This  tells  us  that  dim  (Z)  = 2.  In  this  example  we  will  find  another  basis.  We 
can  construct  two  new  matrices  in  Z by  forming  linear  combinations  of  the  matrices 
in  B. 


-2 

1 


1 

0 


+ (—3) 


-2 

0 


0 

1 


2 2 

2 -3 


2 
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—2 

1 

+ 1 

-2 

0* 

'-8 

3' 

1 

0 

0 

1 

— 

3 

1 

Then  the  set 


C = 


2 2 
2 -3 


-8  3 

3 1 


has  the  right  size  to  be  a basis  of  Z.  Let  us  see  if  it  is  a linearly  independent  set. 
The  relation  of  linear  dependence 


a i 


2 2 
2 -3 


CL2 


-8 

3 


= O 


2ai  — 8ci2 

2cl\  3a2 

0 

o' 

2cl\  + 3tt2 

— 3fti  + d2 

0 

0 

leads  to  the  homogeneous  system  of  equations  whose  coefficient  matrix 

2 - 


row-reduces  to 


2 

2 

-3 

[E 

0 

0 

0 


0 

0 

0 

0 


So  with  ai  = a2  = 0 as  the  only  solution,  the  set  is  linearly  independent.  Now 
we  can  apply  Theorem  G to  see  that  C also  spans  Z and  therefore  is  a second  basis 
for  Z.  A 


Example  SVP4  Sets  of  vectors  in  P4 
In  Example  BSP4  we  showed  that 

B = {x  — 2,  x2  — 4x  + 4,  x3  — 6x2  + 12a:  — 8,  x4  — 8x3  + 24a:2  — 32a;  + 16} 

is  a basis  for  W = {p(x)\p  € P4,  p{ 2)  = 0}.  So  dim  (IT)  = 4. 

The  set 

{3a;2  — 5a:  — 2,  2a;2  — lx  + 6,  x3  — 2x2  + x — 2} 

is  a subset  of  W (check  this)  and  it  happens  to  be  linearly  independent  (check  this, 
too).  However,  by  Theorem  G it  cannot  span  W. 

The  set 

{3a:2  — 5x  — 2,  2a;2  — 7x  + 6,  x3  — 2x2  + x — 2,  —x4  + 2x3  + 5a;2  — 10a:,  x4  — 16} 

is  another  subset  of  W (check  this)  and  Theorem  G tells  us  that  it  must  be  linearly 
dependent. 
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The  set 

{x  — 2,  x2  — 2x,  x3  — 2x2,  xA  — 2x3} 

is  a third  subset  of  W (check  this)  and  is  linearly  independent  (check  this).  Since  it 
has  the  right  size  to  be  a basis,  and  is  linearly  independent,  Theorem  G tells  us  that 
it  also  spans  W,  and  therefore  is  a basis  of  W.  A 

A simple  consequence  of  Theorem  G is  the  observation  that  a proper  subspace 
has  strictly  smaller  dimension  that  its  parent  vector  space.  Hopefully  this  may  seem 
intuitively  obvious,  but  it  still  requires  proof,  and  we  will  cite  this  result  later. 

Theorem  PSSD  Proper  Subspaces  have  Smaller  Dimension 
Suppose  that  U and  V are  subspaces  of  the  vector  space  W , such  that  U C V.  Then 
dim  (U)  < dim  (P). 

Proof.  Suppose  that  dim  (U)  = m and  dim  (P)  = t.  Then  U has  a basis  B of  size 
m.  If  m > t,  then  by  Theorem  G,  B is  linearly  dependent,  which  is  a contradiction. 
If  m = t,  then  by  Theorem  G,  B spans  V.  Then  U = ( B ) = V,  also  a contradiction. 
All  that  remains  is  that  m < t,  which  is  the  desired  conclusion.  ■ 

The  final  theorem  of  this  subsection  is  an  extremely  powerful  tool  for  establishing 
the  equality  of  two  sets  that  are  subspaces.  Notice  that  the  hypotheses  include  the 
equality  of  two  integers  (dimensions)  while  the  conclusion  is  the  equality  of  two 
sets  (subspaces).  It  is  the  extra  “structure”  of  a vector  space  and  its  dimension  that 
makes  possible  this  huge  leap  from  an  integer  equality  to  a set  equality. 

Theorem  EDYES  Equal  Dimensions  Yields  Equal  Subspaces 

Suppose  that  U and  V are  subspaces  of  the  vector  space  W , such  that  U C V and 

dim  (U)  = dim  (P).  Then  U = V. 

Proof.  We  give  a proof  by  contradiction  (Proof  Technique  CD).  Suppose  to  the 
contrary  that  U =fV.  Since  U CP,  there  must  be  a vector  v such  that  v G V and 
v ^ U.  Let  B = {ui,  U2,  U3,  . . . , ut}  be  a basis  for  U.  Then,  by  Theorem  ELIS, 
the  set  C = B U {v}  = {ui,  u2,  u3,  . . . , ut,  v}  is  a linearly  independent  set  of  t + 1 
vectors  in  V.  However,  by  hypothesis,  V has  the  same  dimension  as  U (namely  t) 
and  therefore  Theorem  G says  that  C is  too  big  to  be  linearly  independent.  This 
contradiction  shows  that  U = V.  ■ 

Subsection  RT 
Ranks  and  Transposes 

We  now  prove  one  of  the  most  surprising  theorems  about  matrices.  Notice  the  paucity 
of  hypotheses  compared  to  the  precision  of  the  conclusion. 

Theorem  RMRT  Rank  of  a Matrix  is  the  Rank  of  the  Transpose 
Suppose  A is  an  m x n matrix.  Then  r (A)  = r (A*). 
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Proof.  Suppose  we  row-reduce  A to  the  matrix  B in  reduced  row-echelon  form,  and 
B has  r nonzero  rows.  The  quantity  r tells  us  three  things  about  B:  the  number  of 
leading  l’s,  the  number  of  nonzero  rows  and  the  number  of  pivot  columns.  For  this 
proof  we  will  be  interested  in  the  latter  two. 

Theorem  BRS  and  Theorem  BCS  each  has  a conclusion  that  provides  a basis,  for 
the  row  space  and  the  column  space,  respectively.  In  each  case,  these  bases  contain 
r vectors.  This  observation  makes  the  following  go. 


r (A)  = dim  (C(A)) 

= r 

= dim  (11(A)) 

= dim  (C(Af)) 

= r (A1) 

Jacob  Linenthal  helped  with  this  proof. 


Definition  ROM 
Theorem  BCS 
Theorem  BRS 
Theorem  CSRST 
Definition  ROM 


This  says  that  the  row  space  and  the  column  space  of  a matrix  have  the  same 
dimension,  which  should  be  very  surprising.  It  does  not  say  that  column  space 
and  the  row  space  are  identical.  Indeed,  if  the  matrix  is  not  square,  then  the  sizes 
(number  of  slots)  of  the  vectors  in  each  space  are  different,  so  the  sets  are  not  even 
comparable. 

It  is  not  hard  to  construct  by  yourself  examples  of  matrices  that  illustrate  Theorem 
RMRT,  since  it  applies  equally  well  to  any  matrix.  Grab  a matrix,  row-reduce  it, 
count  the  nonzero  rows  or  the  number  of  pivot  columns.  That  is  the  rank.  Transpose 
the  matrix,  row-reduce  that,  count  the  nonzero  rows  or  the  pivot  columns.  That  is 
the  rank  of  the  transpose.  The  theorem  says  the  two  will  be  equal.  Every  time.  Here 
is  an  example  anyway. 

Example  RRTI  Rank,  rank  of  transpose,  Archetype  I 
Archetype  I has  a 4 x 7 coefficient  matrix  which  row-reduces  to 

'0  4 0 0 2 1 -3' 

0 0 |T]  0 1 -3  5 

0 0 0 0 2-66 
0 0 0 0 0 0 0 
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so  the  rank  is  3.  Row-reducing  the  transpose  yields 


"0  0 0 
0 0 0 ^ 
0 0 0 f 


0 0 0 0 
0 0 0 0 
0 0 0 0 
0 0 0 0 


demonstrating  that  the  rank  of  the  transpose  is  also  3. 


A 


Subsection  DFS 
Dimension  of  Four  Subspaces 

That  the  rank  of  a matrix  equals  the  rank  of  its  transpose  is  a fundamental  and 
surprising  result.  However,  applying  Theorem  FS  we  can  easily  determine  the 
dimension  of  all  four  fundamental  subspaces  associated  with  a matrix. 

Theorem  DFS  Dimensions  of  Four  Subspaces 

Suppose  that  A is  an  m x n matrix,  and  B is  a row- equivalent  matrix  in  reduced 
row-echelon  form  with  r nonzero  rows.  Then 

1.  dim  ( J\f(A ))  = n — r 

2.  dim(C(A))=r 

3.  dim  (1Z(A))  = r 

4-  dim  (£(A))  = m — r 

Proof.  If  A row-reduces  to  a matrix  in  reduced  row-echelon  form  with  r nonzero 
rows,  then  the  matrix  C of  extended  echelon  form  (Definition  EEF)  will  be  an  r x n 
matrix  in  reduced  row-echelon  form  with  no  zero  rows  and  r pivot  columns  (Theorem 
PEEF).  Similarly,  the  matrix  L of  extended  echelon  form  (Definition  EEF)  will  be 
an  m — r X m matrix  in  reduced  row-echelon  form  with  no  zero  rows  and  to  — r 
pivot  columns  (Theorem  PEEF). 


dim(A/”(A))  = dim 


Theorem  FS 
Theorem  BNS 


= n — r 


dim  (C(A))  = dim  ( J\f(L )) 


Theorem  FS 
Theorem  BNS 


= to  — (to  — r) 
= r 
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dim  (7 Z(A))  = dim  (7Z(C)) 
= r 


Theorem  FS 
Theorem  BRS 


dim  (£(A))  = dim  (JZ(L)) 
= to  — r 


Theorem  FS 
Theorem  BRS 


There  are  many  different  ways  to  state  and  prove  this  result,  and  indeed,  the 
equality  of  the  dimensions  of  the  column  space  and  row  space  is  just  a slight  expansion 
of  Theorem  RMRT.  However,  we  have  restricted  our  techniques  to  applying  Theorem 
FS  and  then  determining  dimensions  with  bases  provided  by  Theorem  BNS  and 
Theorem  BRS.  This  provides  an  appealing  symmetry  to  the  results  and  the  proof. 

Reading  Questions 


1.  Why  does  Theorem  G have  the  title  it  does? 

2.  Why  is  Theorem  RMRT  so  surprising  ? 

3.  Row-reduce  the  matrix  A to  reduced  row-echelon  form.  Without  any  further  computa- 
tions, compute  the  dimensions  of  the  four  subspaces,  (a)  Af(A),  (b)  C(A),  (c)  7 Z(A)  and 
(d)  C(A). 


A = 


T 

1 

0 

2 


-1 

1 

2 

0 


2 

1 

-3 

1 


8 

4 

-8 

8 


5 ' 

-1 

-6 

4 


Exercises 

CIO  Example  SVP4  leaves  several  details  for  the  reader  to  check.  Verify  these  five  claims. 
C40'  Determine  if  the  set  T = {x2  — x + 5,  4x3  — x2  + 5*,  3rr  + 2}  spans  the  vector 
space  of  polynomials  with  degree  4 or  less,  P4.  (Compare  the  solution  to  this  exercise  with 
Solution  LISS.C40.) 

T05  Trivially,  if  U and  V are  two  subspaces  of  W with  U = V,  then  dim  (U)  = dim  ( V ). 
Combine  this  fact,  Theorem  PSSD,  and  Theorem  EDYES  all  into  one  grand  combined 
theorem.  You  might  look  to  Theorem  PIP  for  stylistic  inspiration.  (Notice  this  problem 
does  not  ask  you  to  prove  anything.  It  just  asks  you  to  roll  up  three  theorems  into  one 
compact,  logically  equivalent  statement.) 

T10  Prove  the  following  theorem,  which  could  be  viewed  as  a reformulation  of  parts  (3)  and 
(4)  of  Theorem  G,  or  more  appropriately  as  a corollary  of  Theorem  G (Proof  Technique  LC). 
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Suppose  V is  a vector  space  and  S'  is  a subset  of  V such  that  the  number  of  vectors 
in  S equals  the  dimension  of  V.  Then  S is  linearly  independent  if  and  only  if  S spans  V. 

T15  Suppose  that  A is  an  m x n matrix  and  let  min (m,  n ) denote  the  minimum  of  m 
and  n.  Prove  that  r (A)  < min(m,  n).  (If  m and  n are  two  numbers,  then  min(m,  n)  stands 
for  the  number  that  is  the  smaller  of  the  two.  For  example  min(4,  6)  = 4.) 

T2(F  Suppose  that  A is  an  m x n matrix  and  b £ Cm.  Prove  that  the  linear  system 
CS(A,  b)  is  consistent  if  and  only  if  r ( A ) = r([  A \ b]). 

T25  Suppose  that  V is  a vector  space  with  finite  dimension.  Let  W be  any  subspace  of 
V.  Prove  that  IF  has  finite  dimension. 

T33^  Part  of  Exercise  B.T50  is  the  half  of  the  proof  where  we  assume  the  matrix  A is 
nonsingular  and  prove  that  a set  is  a basis.  In  Solution  B.T50  we  proved  directly  that  the 
set  was  both  linearly  independent  and  a spanning  set.  Shorten  this  part  of  the  proof  by 
applying  Theorem  G.  Be  careful,  there  is  one  subtlety. 

T6(F  Suppose  that  IF  is  a vector  space  with  dimension  5,  and  U and  V are  subspaces  of 
IF,  each  of  dimension  3.  Prove  that  U (T  V contains  a nonzero  vector.  State  a more  general 
result. 


Chapter  D 
Determinants 


The  determinant  is  a function  that  takes  a square  matrix  as  an  input  and  produces 
a scalar  as  an  output.  So  unlike  a vector  space,  it  is  not  an  algebraic  structure. 
However,  it  has  many  beneficial  properties  for  studying  vector  spaces,  matrices  and 
systems  of  equations,  so  it  is  hard  to  ignore  (though  some  have  tried).  While  the 
properties  of  a determinant  can  be  very  useful,  they  are  also  complicated  to  prove. 


Section  DM 

Determinant  of  a Matrix 

Before  we  define  the  determinant  of  a matrix,  we  take  a slight  detour  to  introduce 
elementary  matrices.  These  will  bring  us  back  to  the  beginning  of  the  course  and 
our  old  friend,  row  operations. 

Subsection  EM 
Elementary  Matrices 

Elementary  matrices  are  very  simple,  as  you  might  have  suspected  from  their  name. 
Their  purpose  is  to  effect  row  operations  (Definition  RO)  on  a matrix  through  matrix 
multiplication  (Definition  MM).  Their  definitions  look  much  more  complicated  than 
they  really  are,  so  be  sure  to  skip  over  them  on  your  first  reading  and  head  right  for 
the  explanation  that  follows  and  the  first  example. 

Definition  ELEM  Elementary  Matrices 


340 
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1.  For  i j,  Ej  j is  the  square  matrix  of  size  n with 


'0 

1 

0 

1 

0 

1 


k ^ i,k  ^ j,£  ± k 
k ^ i,k  ^ j,£  = k 
k = i,£^j 
k = i,l  = j 
k=j,£^i 
k = j,£  = i 


2.  For  q/0  , Ei  (a)  is  the  square  matrix  of  size  n with 


To  £^k 

iEi  {a)\u  = l 1 k^i,£  = k 
[a  k = i,£  = i 


3.  For  i ^ j,  Ej  j (a)  is  the  square  matrix  of  size  n with 


[Ejj  {a)\k£  — 


0 

1 

< 0 
1 


a 


kjtj,£  = k 
k=j,£j=i,£^j 
k = j,£  = j 
k = j,£  = i 


□ 

Again,  these  matrices  are  not  as  complicated  as  their  definitions  suggest,  since 
they  are  just  small  perturbations  of  the  n x n identity  matrix  (Definition  IM).  Ej  j 
is  the  identity  matrix  with  rows  (or  columns)  i and  j trading  places,  Ej  (a)  is  the 
identity  matrix  where  the  diagonal  entry  in  row  i and  column  i has  been  replaced 
by  a,  and  Ejj  (a)  is  the  identity  matrix  where  the  entry  in  row  j and  column  i 
has  been  replaced  by  a.  (Yes,  those  subscripts  look  backwards  in  the  description  of 
Ejj  (a)).  Notice  that  our  notation  makes  no  reference  to  the  size  of  the  elementary 
matrix,  since  this  will  always  be  apparent  from  the  context,  or  unimportant. 

The  raison  d’etre  for  elementary  matrices  is  to  “do”  row  operations  on  matrices 
with  matrix  multiplication.  So  here  is  an  example  where  we  will  both  see  some 
elementary  matrices  and  see  how  they  accomplish  row  operations  when  used  with 
matrix  multiplication. 

Example  EMRO  Elementary  matrices  and  row  operations 

We  will  perform  a sequence  of  row  operations  (Definition  RO)  on  the  3x4  matrix 
A,  while  also  multiplying  the  matrix  on  the  left  by  the  appropriate  3x3  elementary 
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matrix. 


A = 


-2 

1 

5 


1 3 
3 2 
0 3 


r 

4 

1 


■5 

0 

3 

r 

‘0 

0 

r 

'2 

1 

3 

r 

■5 

0 

3 

r 

R\  R%  : 

1 

3 

2 

4 

Ei, 3 : 

0 

1 

0 

1 

3 

2 

4 

= 

1 

3 

2 

4 

2 

1 

3 

1. 

.1 

0 

0. 

5 

0 

3 

1. 

2 

1 

3 

1. 

■5 

0 

3 

r 

1 

0 

O' 

'5 

0 

3 

r 

■5 

0 

3 

r 

2R2  : 

2 

6 

4 

8 

E2  (2)  : 

0 

2 

0 

1 

3 

2 

4 

= 

2 

G 

4 

8 

2 

1 

3 

1. 

0 

0 

1. 

2 

1 

3 

1. 

2 

1 

3 

1. 

■9 

2 

9 

3- 

1 

0 

2‘ 

'5 

0 

3 

r 

■9 

2 

9 

3- 

2i^3  + Ri  : 

2 

6 

4 

8 

E3,i  (2)  : 

0 

1 

0 

2 

6 

4 

8 

= 

2 

6 

4 

8 

2 

1 

3 

1 

0 

0 

1 

2 

1 

3 

1 

2 

1 

3 

1 

A 


The  next  three  theorems  establish  that  each  elementary  matrix  effects  a row 
operation  via  matrix  multiplication. 

Theorem  EMDRO  Elementary  Matrices  Do  Row  Operations 

Suppose  that  A is  anm.xn  matrix,  and  B is  a matrix  of  the  same  size  that  is  obtained 

from  A by  a single  row  operation  (Definition  RO).  Then  there  is  an  elementary 

matrix  of  size  m that  will  convert  A to  B via  matrix  multiplication  on  the  left.  More 

precisely, 


1.  If  the  row  operation  swaps  rows  i and  j , then  B = EijA. 

2.  If  the  row  operation  multiplies  row  i by  a,  then  B = Ei  (a)  A. 

3.  If  the  row  operation  multiplies  row  i by  a and  adds  the  result  to  row  j , then 
B = Eij  (a)  A. 

Proof.  In  each  of  the  three  conclusions,  performing  the  row  operation  on  A will 
create  the  matrix  B where  only  one  or  two  rows  will  have  changed.  So  we  will 
establish  the  equality  of  the  matrix  entries  row  by  row,  first  for  the  unchanged  rows, 
then  for  the  changed  rows,  showing  in  each  case  that  the  result  of  the  matrix  product 
is  the  same  as  the  result  of  the  row  operation.  Here  we  go. 

Row  k of  the  product  EijA,  where  k ^ i,  k ^ j,  is  unchanged  from  A , 

n 

[EidA\u  = Yj  lA]pt  Theorem  EMP 

P=  1 

n 

= iE‘j\kk  m kt  + \Ei,j\kp  \-A\pt 

p=  1 
p^k 


Property  CACN 
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= 1[A]h  + J2°  \A]pt  Definition  ELEM 

p= i 

p^k 

= 

Row  i of  the  product  E,:JA  is  row  j of  A, 

n 

lEiJA]u  = E \-Ei-Ap  Theorem  EMP 

P=  1 


= iEij]ij  \A]je  + E \.Eij]ip  \A\pl  Property  CACN 

p=  i 
p^j 


= 1 [A]^  + ^ 0 [A]p^  Definition  ELEM 

p= i 
pAi 

= 

Row  j of  the  product  EijA  is  row  i of  A, 

n 

[Ei,jA]je  = E {Ei^jP  Theorem  EMP 

P=  1 


= [^]«  + E fiuhp  \AU  Pr°perty CACN 

p=  i 

p^i 


= 1 + E 0 Definition  ELEM 

p=l 

P/* 

= [4* 

So  the  matrix  product  Ei^A  is  the  same  as  the  row  operation  that  swaps  rows  i 
and  j. 

Row  k of  the  product  Ei  (a)  A , where  k ^ i,  is  unchanged  from  A, 

n 

\Ei  («)  A\u  = E (a)]fcp  Theorem  EMP 

P=  1 


= [#*  (a)]fcfe  + E (a)lfcp  Property  CACN 

P=  1 
p^k 
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n 


~ 1 [A] hi  + X!  0 [A]pi 

Definition  ELEM 

p= 1 
Pytk 

= [A]ke 

Row  i of  the  product  Ei  ( a ) A is  a times  row  i of  A , 

n 

[Ei(a)A]u  = J2[Ei(a)}lp  [A]p£ 

P=  1 

Theorem  EMP 

n 

= [Ei  (a)]u  [A\il  + X]  [Ei  [A\pl 

Property  CACN 

p=  1 
p/* 

= a [A]u  + ^2  0 [A\pi 

Definition  ELEM 

P=  1 


= a \A\u 

So  the  matrix  product  Ei  (a)  A is  the  same  as  the  row  operation  that  swaps 
multiplies  row  i by  a. 

Row  k of  the  product  Eij  (a)  A,  where  k ^ j,  is  unchanged  from  A1 

n 

[Ei,j  (<*)  A]u  = [ Ei,j  {a)]kp  [A]pl  Theorem  EMP 

P=  1 

n 

= \Eid  («)]fcfc  \A\u  + J2  (a)]fcP  lAU  Property  CACN 

P=  1 
p^k 

n 

= 1 [A]kg  + 0 [A]pl  Definition  ELEM 

p= i 

p^k 

= 

Row  j of  the  product  Ejj  (a)  A,  is  a times  row  i of  A and  then  added  to  row  j 
of  A, 

n 

iEi,i  («)  A]je  = X [Eij  (a)]jp  [A}pt  Theorem  EMP 

P=  1 

= [E%,j  ia)]jj  iA]jt  + 

n 

[Eid  (a)\ji  [A\u  + Y1  ( a)\jp  [A\Pi  Property  CACN 

P=  1 
Pi^jA 
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= 1 [A]-t  + a [A\u  + ^2  0 \A]pi  Definition  ELEM 

p=i 

pA/d 

= + a [A]u 

So  the  matrix  product  Eij  ( a ) A is  the  same  as  the  row  operation  that  multiplies 
row  i by  a and  adds  the  result  to  row  j.  ■ 

Later  in  this  section  we  will  need  two  facts  about  elementary  matrices. 

Theorem  EMN  Elementary  Matrices  are  Nonsingular 
If  E is  an  elementary  matrix,  then  E is  nonsingular. 

Proof.  We  show  that  we  can  row-reduce  each  elementary  matrix  to  the  identity 
matrix.  Given  an  elementary  matrix  of  the  form  E,j , perform  the  row  operation 
that  swaps  row  j with  row  i.  Given  an  elementary  matrix  of  the  form  £/  (a),  with 
ct/0,  perform  the  row  operation  that  multiplies  row  i by  1/a.  Given  an  elementary 
matrix  of  the  form  Eij  (a),  with  a/0,  perform  the  row  operation  that  multiplies 
row  i by  —a  and  adds  it  to  row  j.  In  each  case,  the  result  of  the  single  row  operation 
is  the  identity  matrix.  So  each  elementary  matrix  is  row-equivalent  to  the  identity 
matrix,  and  by  Theorem  NMRRI  is  nonsingular.  ■ 

Notice  that  we  have  now  made  use  of  the  nonzero  restriction  on  a in  the  definition 
of  Ei  (a).  One  more  key  property  of  elementary  matrices. 

Theorem  NMPEM  Nonsingular  Matrices  are  Products  of  Elementary  Matrices 
Suppose  that  A is  a nonsingular  matrix.  Then  there  exists  elementary  matrices 
E\,  E'2 . E3 , . . . , Et  so  that  A = E1E2E3  . . . Et- 

Proof.  Since  A is  nonsingular,  it  is  row-equivalent  to  the  identity  matrix  by  Theorem 
NMRRI,  so  there  is  a sequence  of  t row  operations  that  converts  I to  A.  For  each  of 
these  row  operations,  form  the  associated  elementary  matrix  from  Theorem  EMDRO 
and  denote  these  matrices  by  E\,  E2 , £3,  . . . , Et.  Applying  the  first  row  operation 
to  / yields  the  matrix  E\I.  The  second  row  operation  yields  E2{E\I ),  and  the  third 
row  operation  creates  E3E2E1I.  The  result  of  the  full  sequence  of  t row  operations 
will  yield  A , so 

A = Et . . . EsE2E\I  = Et . . . E3E2E1 

Other  than  the  cosmetic  matter  of  re-indexing  these  elementary  matrices  in  the 
opposite  order,  this  is  the  desired  result.  ■ 
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Subsection  DD 

Definition  of  the  Determinant 


We  will  now  turn  to  the  definition  of  a determinant  and  do  some  sample  computations. 
The  definition  of  the  determinant  function  is  recursive,  that  is,  the  determinant  of 
a large  matrix  is  defined  in  terms  of  the  determinant  of  smaller  matrices.  To  this 
end,  we  will  make  a few  definitions. 


Definition  SM  SubMatrix 

Suppose  that  A is  an  mxn  matrix.  Then  the  submatrix  A (i\j)  is  the  (m— 1)  x (n— 1) 
matrix  obtained  from  A by  removing  row  i and  column  j.  □ 


Example  SS  Some  submatrices 
For  the  matrix 


A = 


1 

4 

3 


we  have  the  submatrices 


A(2|3) 


1 -2  9 

3 5 1 


-2  3 9" 
-2  0 1 
5 2 1 


Am 


-2  3 9 
-2  0 1 


A 


Definition  DM  Determinant  of  a Matrix 

Suppose  A is  a square  matrix.  Then  its  determinant,  det  (A)  = |A|,  is  an  element 
of  C defined  recursively  by: 


1.  If  A is  a 1 x 1 matrix,  then  det  (A)  = [A]n. 

2.  If  A is  a matrix  of  size  n with  n > 2,  then 

det  (A)  = [A]n  det  (A(l|l))  - [A]12det  (A  (1|2))  + [A]13  det  (A  (1|3))  - 
[A]  14  det  (A  (1|4))  + • • • + (-1)"+1  [A]ln  det  (A  (l|n)) 


□ 

So  to  compute  the  determinant  of  a 5 x 5 matrix  we  must  build  5 submatrices, 
each  of  size  4.  To  compute  the  determinants  of  each  the  4x4  matrices  we  need 
to  create  4 submatrices  each,  these  now  of  size  3 and  so  on.  To  compute  the 
determinant  of  a 10  x 10  matrix  would  require  computing  the  determinant  of 
101  = 10  x9x8x7x6x5x4x3x2  = 3, 628,  800  lxl  matrices.  Fortunately  there 
are  better  ways.  However  this  does  suggest  an  excellent  computer  programming 
exercise  to  write  a recursive  procedure  to  compute  a determinant. 

Let  us  compute  the  determinant  of  a reasonably  sized  matrix  by  hand. 
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Example  D33M  Determinant  of  a 3 x 3 matrix 
Suppose  that  we  have  the  3x3  matrix 

3 


A = 


4 

-3 


2 

1 

-1 


-r 

6 

2 


Then 


det  (A)  = |A|  = 


3 

4 

-3 


= 3 


1 

-1 


2 

1 

-1 

6 

2 


-1 

6 

2 

- 2 


4 

-3 


+ (-l) 


4 

-3 


1 

-1 


= 3(1 12|  — 6 I — 1|)  — 2(4|2|  — 6|—  3|)  — (4|  — 1|  — 1 1— 3|) 
= 3 (1(2)  - 6(— 1))  - 2 (4(2)  - 6(— 3))  - (4(-l)  - l(-3)) 
= 24-  52  + 1 
= -27 


A 


In  practice  it  is  a bit  silly  to  decompose  a 2 x 2 matrix  down  into  a couple  of 
lxl  matrices  and  then  compute  the  exceedingly  easy  determinant  of  these  puny 
matrices.  So  here  is  a simple  theorem. 

Theorem  DMST  Determinant  of  Matrices  of  Size  Two 
a b 


Suppose  that  A = 


Then  det  (A)  = ad  — be. 


Proof.  Applying  Definition  DM, 

a b 
c d 


= a\d\  — b |c|  = ad  — be 


Do  you  recall  seeing  the  expression  ad  — be  before?  (Hint:  Theorem  TTMI) 

Subsection  CD 
Computing  Determinants 

There  are  a variety  of  ways  to  compute  the  determinant.  We  will  establish  first 
that  we  can  choose  to  mimic  our  definition  of  the  determinant,  but  by  using  matrix 
entries  and  submatrices  based  on  a row  other  than  the  first  one. 

Theorem  DER  Determinant  Expansion  about  Rows 
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Suppose  that  A is  a square  matrix  of  size  n.  Then  for  1 < i < n 

det  (A)  = (-1)I+1  [A]a  det  (A  (*|1))  + (-l)i+2  [A]a  det  (A  (»|2)) 

+ (-1)I+3  [A]i3  det  (A  (i|3))  + • • • + (-l)i+n  [A]in  det  (A  (z|n)) 


which  is  known  as  expansion  about  row  i. 

Proof.  First,  the  statement  of  the  theorem  coincides  with  Definition  DM  when  i = 1, 
so  throughout,  we  need  only  consider  i > 1. 

Given  the  recursive  definition  of  the  determinant,  it  should  be  no  surprise  that  we 
will  use  induction  for  this  proof  (Proof  Technique  I).  When  n = 1,  there  is  nothing 
to  prove  since  there  is  but  one  row.  When  n = 2,  we  just  examine  expansion  about 
the  second  row, 


So  the  theorem  is  true  for  matrices  of  size  n = 1 and  n = 2.  Now  assume  the 
result  is  true  for  all  matrices  of  size  n — 1 as  we  derive  an  expression  for  expansion 
about  row  i for  a matrix  of  size  n.  We  will  abuse  our  notation  for  a submatrix  slightly, 
so  A ('i-| , i2|ji , j-i)  will  denote  the  matrix  formed  by  removing  rows  i i and  *2,  along 
with  removing  columns  j\  and  j^.  Also,  as  we  take  a determinant  of  a submatrix, 
we  will  need  to  “jump  up”  the  index  of  summation  partway  through  as  we  “skip 
over”  a missing  column.  To  do  this  smoothly  we  will  set 


(-1)2+1  [A]21  det  (A  (2|  1))  + (-1)2+2  [A] 22  det  (A  (2|2)) 


“ - [^21  Ml2  + 1^22  [^]ll 
= [^]ll  [^22  - ["4]  12  [^21 
= det (A) 


Definition  DM 


Theorem  DMST 


i<j 
l>  j 


Now, 
det  (A) 


n 


5>l)1+MA]ydet(A(l|j)) 


Definition  DM 


n 


5I(_1)1+J  [A]y  (_1)1  1+1  £/5' [j4]«det(A(l,i| j,t))  Induction 


n 


E E {—ky+i+e~e‘j  [AJy  [A]if  det  (A(l,i\j,£)) 


j= 1 l<£<n 


Property  DCN 
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n 

= E E (-1  V+i+*-£«  [A]y  [A]u  det  (A  (1,  i\j,  t)) 

l—  1 l<j<n 
n 

= E(-l)<+/^]«  E (— [^]y  det  (A  (1,  i\j,£)) 

1=1  l<7<n 

j¥4 

n 

= E(-1  )i+t\-A\u  E (— l)e^+J  [Aly  det  (A  (i, 

t= 1 1 <j<n 

frl 

n 

= E(-l)i+"  [4*det(A(i|*)) 

^=i 


Property  CACN 


Property  DCN 


2qj  is  even 
Definition  DM 


We  can  also  obtain  a formula  that  computes  a determinant  by  expansion  about 
a column,  but  this  will  be  simpler  if  we  first  prove  a result  about  the  interplay  of 
determinants  and  transposes.  Notice  how  the  following  proof  makes  use  of  the  ability 
to  compute  a determinant  by  expanding  about  any  row. 

Theorem  DT  Determinant  of  the  Transpose 

Suppose  that  A is  a square  matrix.  Then  det  (A4)  = det  (A). 


Proof.  With  our  definition  of  the  determinant  (Definition  DM)  and  theorems  like 
Theorem  DER,  using  induction  (Proof  Technique  I)  is  a natural  approach  to  proving 
properties  of  determinants.  And  so  it  is  here.  Let  n be  the  size  of  the  matrix  A,  and 
we  will  use  induction  on  n. 

For  n = 1,  the  transpose  of  a matrix  is  identical  to  the  original  matrix,  so 
vacuously,  the  determinants  are  equal. 

Now  assume  the  result  is  true  for  matrices  of  size  n — 1.  Then, 


det  (A4) 


E det  ( At ) 

i=  1 

n n 
i= 1 3 = 1 

Theorem  DER 

n n 

EE(-i)i+i[^det  (At  m) 

i= 1 j=l 

Definition  TM 

n n 

EE(-l)i+J[%det((A(j|i))t) 

i= 1 3=1 

Definition  TM 
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1 n n 

= -EE(-1)l+JM,  idet(A(j|i)) 

i=i  j=i 

Induction  Hypothesis 

1 n n 

= -EE(-1)J+M%  det(A(j|*)) 

j— 1 i=l 

Property  CACN 

1 " 

= _Edet(A) 

7 — 1 

Theorem  DER 

J — 

= det  (A) 

Now  we  can  easily  get  the  result  that  a determinant  can  be  computed  by  expansion 
about  any  column  as  well. 

Theorem  DEC  Determinant  Expansion  about  Columns 
Suppose  that  A is  a square  matrix  of  size  n.  Then  for  1 < j < n 

det  (A)  = (-l)1**  [A]„  det  (A  (l|j))  + (~1)2+J  My  det  {A  (2| j)) 

+ (-1)3+J  [A]3j  det  {A  (3|j))  + • • • + (-l)n+J  [A]nj  det  (A  (■ n\j )) 

which  is  known  as  expansion  about  column  j . 

Proof. 


det  (A)  = det  (A*) 

Theorem  DT 

n 

= E(-1Ei  M^det  (A*  (j\i)) 

i= 1 

Theorem  DER 

n 

= E(-^+i  M]  jt  det  ((A  (*  j))*) 

i=l 

Definition  TM 

n 

= E(" l)J'+i  [^],idet(A(i|j)) 

i=l 

Theorem  DT 

n 

— E(— l)i+J  My  det  (A  (*  j)) 

Definition  TM 

4 = 1 


That  the  determinant  of  annxn  matrix  can  be  computed  in  2 n different  (albeit 
similar)  ways  is  nothing  short  of  remarkable.  For  the  doubters  among  us,  we  will  do 
an  example,  computing  a 4 x 4 matrix  in  two  different  ways. 
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Example  TCSD  Two  computations,  same  determinant 
Let 


A = 


-2 

9 

1 

4 


3 

-2 

3 

1 


0 

0 

-2 

2 


1 ' 
1 

-1 

6 


Then  expanding  about  the  fourth  row  (Theorem  DER  with  i = 4)  yields, 


3 

0 

1 

—2 

0 

1 

(4)(-l)4+1 

-2 

0 

1 

+ (1)(-1)4+2 

9 

0 

1 

3 

-2 

-1 

1 

-2 

-1 

—2 

3 

1 

—2 

3 

0 

+ (2)(— 1) 

4+3 

9 

—2 

1 

+ (6)(~1) 

4+4 

9 

—2 

0 

1 

3 

-1 

1 

3 

-2 

= (— 4)(10)  + (1)(— 22)  + (— 2)(61)  + 6(46)  = 92 
Expanding  about  column  3 (Theorem  DEC  with  j = 3)  gives 


9 

—2 

1 

-2 

3 

1 

(0)(-i)1+3 

1 

3 

-1 

+ (0)(— 1)2+3 

1 

3 

-1 

+ 

4 

1 

6 

4 

1 

6 

—2 

3 

1 

-2 

3 

1 

(— 2)(— 1)3+3 

9 

-2 

1 

+ (2)(-l)4+3 

9 

—2 

1 

4 

1 

6 

1 

3 

-1 

= 0 + 0 + (— 2)(— 107)  + (— 2)(61)  = 92 


Notice  how  much  easier  the  second  computation  was.  By  choosing  to  expand 
about  the  third  column,  we  have  two  entries  that  are  zero,  so  two  3x3  determinants 
need  not  be  computed  at  all!  A 


When  a matrix  has  all  zeros  above  (or  below)  the  diagonal,  exploiting  the  zeros 
by  expanding  about  the  proper  row  or  column  makes  computing  a determinant 
insanely  easy. 


Example  DUTM  Determinant  of  an  upper  triangular  matrix 
Suppose  that 


T = 


'2 

0 

0 

0 

0 


3 

-1 

0 

0 

0 


-1  3 

5 2 

3 9 

0 -1 
0 0 


3 ' 
-1 

2 

3 

5 


We  will  compute  the  determinant  of  this  5x5  matrix  by  consistently  expanding 
about  the  first  column  for  each  submatrix  that  arises  and  does  not  have  a zero  entry 
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multiplying  it. 


det  (T)  = 


2 3 
0 -1 
0 0 
0 0 
0 0 


-1  3 

5 2 

3 9 

0 -1 

0 0 


3 

-1 

2 

3 

5 


= 2(— 1) 


i+i 


-15  2 
0 3 9 
0 0-1 
0 0 0 


= 2(— 1)(— 1) 


i+i 


3 9 

0 -1 

0 0 


= 2(— 1)(3)(— 1) 


i+i 


-1 

0 

i+i 


-II 

2 

3 

5 

2 

3 

5 

3 
5 


= 2(— 1)(3)(— 1)(— 1)1+1 15| 
= 2(— 1)(3)(— 1)(5)  = 30 


A 


When  you  consult  other  texts  in  your  study  of  determinants,  you  may  run  into 
the  terms  “minor”  and  “cofactor,”  especially  in  a discussion  centered  on  expansion 
about  rows  and  columns.  We  have  chosen  not  to  make  these  definitions  formally 
since  we  have  been  able  to  get  along  without  them.  However,  informally,  a minor  is 
a determinant  of  a submatrix,  specifically  det  (A  (i|j))  and  is  usually  referenced  as 
the  minor  of  [A] t,- . A cofactor  is  a signed  minor,  specifically  the  cofactor  of  [A],;J  is 
(—l)i+J  det  (. A ( i\j )). 


Reading  Questions 

1.  Construct  the  elementary  matrix  that  will  effect  the  row  operation  —6R2  + A3  on  a 
4x7  matrix. 

2.  Compute  the  determinant  of  the  matrix 

'2  3 -l' 

3 8 2 

4 -1  -3 

3.  Compute  the  determinant  of  the  matrix 

'3  9 -2  4 2' 

0 1 4 -2  7 

0 0 -2  5 2 

00  0 -16 
0 0 0 0 4 
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Exercises 

C2F  Doing  the  computations  by  hand,  find  the  determinant  of  the  matrix  below. 

'1  3' 

6 2 

C22'  Doing  the  computations  by  hand,  find  the  determinant  of  the  matrix  below. 

'l  3' 

2 6 

C23^  Doing  the  computations  by  hand,  find  the  determinant  of  the  matrix  below. 

'l  3 2 

4 1 3 

1 0 1 

C24^  Doing  the  computations  by  hand,  find  the  determinant  of  the  matrix  below. 

'-2  3 -2' 

-4  -2  1 

2 4 2 

C25^  Doing  the  computations  by  hand,  find  the  determinant  of  the  matrix  below. 

'3  -1  4' 

2 5 1 

2 0 6 


C26^  Doing  the  computations  by  hand,  find  the  determinant  of  the  matrix  A. 


A = 


'2 

5 

3 

5 


0 3 

1 2 
0 1 
3 2 


2' 

4 

2 

1 


C27^  Doing  the  computations  by  hand,  find  the  determinant  of  the  matrix  A. 


A = 


'1  0 
2 2 
2 1 
1 1 


1 

-1 

3 

0 


r 

1 

0 

1 


C28^  Doing  the  computations  by  hand,  find  the  determinant  of  the  matrix  A. 


A = 


' 1 

2 

2 

1 


0 

-1 

5 

-1 


1 1‘ 
-1  1 
3 0 

0 1 
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C29f 


Doing  the  computations  by  hand,  find  the  determinant  of  the  matrix  A. 


A = 


'2 

0 

0 

0 

0 


3 0 
1 1 
0 1 
1 2 
0 0 


2 1' 
1 2 
2 3 
1 0 
1 2 


C3(F  Doing  the  computations  by  hand,  find  the  determinant  of  the  matrix  A. 

11 


A = 


'2110 
212-11 
0 0 12  0 
10  3 11 

2 112  1 


M10'  Find  a value  of  k so  that  the  matrix  A = 
it  is  not  possible. 

MU'  Find  a value  of  k so  that  the  matrix  A = 


2 4 

3 k 


why  it  is  not  possible. 

M15^  Given  the  matrix  B = 
det(B)  = 0. 

M161"  Given  the  matrix  B = 
solutions  of  det(B)  = 0. 


1 2 1 
2 0 1 
2 3 k 


has  det(A)  = 0,  or  explain  why 
has  det(A)  = 0,  or  explain 


2 — x 
4 

4 — x 
2 
3 


1 

2-x 


find  all  values  of  x that  are  solutions  of 


-4  -4 

-2-x  -4 

-3  -4-x 


, find  all  values  of  x that  are 


M30  The  two  matrices  below  are  row-equivalent.  How  would  you  confirm  this?  Since  the 
matrices  are  row-equivalent,  there  is  a sequence  of  row  operations  that  converts  X into  Y , 
which  would  be  a product  of  elementary  matrices,  M,  such  that  MX  = Y.  Find  M.  (This 
approach  could  be  used  to  find  the  “9  scalars”  of  the  very  early  Exercise  RREF.M40.) 
Hint:  Compute  the  extended  echelon  form  for  both  matrices,  and  then  use  the  property 
from  Theorem  PEEF  that  reads  B = JA,  where  A is  the  original  matrix,  B is  the  echelon 
form  of  the  matrix  and  J is  a nonsingular  matrix  obtained  from  extended  echelon  form. 
Combine  the  two  square  matrices  in  the  right  way  to  obtain  M. 


■-1 

3 

1 

-2 

8 ' 

'-I 

2 

2 

0 

O' 

-1 

3 

2 

-1 

4 

Y = 

-3 

6 

8 

-1 

1 

2 

-4 

-3 

2 

-7 

0 

1 

-2 

-2 

9 

-2 

5 

3 

-2 

8 

-1 

4 

-3 

-3 

16 

Section  PDM 

Properties  of  Determinants  of  Matrices 

We  have  seen  how  to  compute  the  determinant  of  a matrix,  and  the  incredible  fact 
that  we  can  perform  expansion  about  any  row  or  column  to  make  this  computation. 
In  this  largely  theoretical  section,  we  will  state  and  prove  several  more  intriguing 
properties  about  determinants.  Our  main  goal  will  be  the  two  results  in  Theorem 
SMZD  and  Theorem  DRMM,  but  more  specifically,  we  will  see  how  the  value  of 
a determinant  will  allow  us  to  gain  insight  into  the  various  properties  of  a square 
matrix. 

Subsection  DRO 

Determinants  and  Row  Operations 

We  start  easy  with  a straightforward  theorem  whose  proof  presages  the  style  of 
subsequent  proofs  in  this  subsection. 

Theorem  DZRC  Determinant  with  Zero  Row  or  Column 

Suppose  that  A is  a square  matrix  with  a row  where  every  entry  is  zero,  or  a column 
where  every  entity  is  zero.  Then  det  (A)  = 0. 

Proof.  Suppose  that  A is  a square  matrix  of  size  n and  row  i has  every  entry  equal 
to  zero.  We  compute  det  {A)  via  expansion  about  row  i. 

n 

det  (^4)  = ^(— l)1"1"-7  [A]^  det  ( A (i|j))  Theorem  DER 

j= i 

n 

= ^(— l)l+J  0 det  {A  ( i\j ))  Row  i is  zeros 

j= i 

n 

= E°  = ° 

3=1 

The  proof  for  the  case  of  a zero  column  is  entirely  similar,  or  could  be  derived 
from  an  application  of  Theorem  DT  employing  the  transpose  of  the  matrix.  ■ 

Theorem  DRCS  Determinant  for  Row  or  Column  Swap 

Suppose  that  A is  a square  matrix.  Let  B be  the  square  matrix  obtained  from  A by 
interchanqinq  the  location  of  two  rows,  or  interchanqinq  the  location  of  two  columns. 
Then  det  (B)  = — det  (A). 

Proof.  Begin  with  the  special  case  where  A is  a square  matrix  of  size  n and  we  form 
B by  swapping  adjacent  rows  i and  i + 1 for  some  1 < i < n — 1.  Notice  that  the 
assumption  about  swapping  adjacent  rows  means  that  B [i  + 1| j)  = A ( i\j ) for  all 
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1 < j < n,  and  [B]i+1  ■ = [A]4  ■ for  all  1 < j < n.  We  compute  det  (B)  via  expansion 
about  row  i + 1. 


n 


det  (B)  = [B)i+1.  det  (B  (i  + 1| j)) 

3= 1 

Theorem  DER 

n 

= E(-1)(<+1)+J'[A]«det(A(iU)) 

3= 1 

Hypothesis 

n 

= ^(-l)1(-l)i+MA]ijdet(A(i|i)) 

3= 1 

n 

= (-l)E(-l)<+J'[A]ydet(A(»U)) 

j — 1 

= — det  (A) 

Theorem  DER 

So  the  result  holds  for  the  special  case  where  we  swap  adjacent  rows  of  the 
matrix.  As  any  computer  scientist  knows,  we  can  accomplish  any  rearrangement  of 
an  ordered  list  by  swapping  adjacent  elements.  This  principle  can  be  demonstrated 
by  naive  sorting  algorithms  such  as  “bubble  sort.”  In  any  event,  we  do  not  need  to 
discuss  every  possible  reordering,  we  just  need  to  consider  a swap  of  two  rows,  say 
rows  s and  t with  1 < s < t < n. 

Begin  with  row  s,  and  repeatedly  swap  it  with  each  row  just  below  it,  including 
row  t and  stopping  there.  This  will  total  t — s swaps.  Now  swap  the  former  row  t, 
which  currently  lives  in  row  t — 1,  with  each  row  above  it,  stopping  when  it  becomes 
row  s.  This  will  total  another  t — s — 1 swaps.  In  this  way,  we  create  B through  a 
sequence  of  2(t  — s)  — 1 swaps  of  adjacent  rows,  each  of  which  adjusts  det  (A)  by  a 
multiplicative  factor  of  —1.  So 

det  ( B ) = (— l)2(t-s)-1  det  (A)  = ((—l)2)*  s (— 1)_1  det  (A)  = — det  (A) 
as  desired. 

The  proof  for  the  case  of  swapping  two  columns  is  entirely  similar,  or  could  be 
derived  from  an  application  of  Theorem  DT  employing  the  transpose  of  the  matrix. 


So  Theorem  DRCS  tells  us  the  effect  of  the  first  row  operation  (Definition  RO) 
on  the  determinant  of  a matrix.  Here  is  the  effect  of  the  second  row  operation. 

Theorem  DRCM  Determinant  for  Row  or  Column  Multiples 
Suppose  that  A is  a square  matrix.  Let  B be  the  square  matrix  obtained  from  A by 
multiplying  a single  row  by  the  scalar  a,  or  by  multiplying  a single  column  by  the 
scalar  a.  Then  det  ( B ) = a det  (A). 

Proof.  Suppose  that  A is  a square  matrix  of  size  n and  we  form  the  square  matrix 
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B by  multiplying  each  entry  of  row  i of  A by  a.  Notice  that  the  other  rows  of  A 
and  B are  equal,  so  A ( i\j ) = B ( i\j ),  for  all  1 < j < n.  We  compute  det  ( B ) via 


expansion  about  row  i. 

TL 

det  (B)  = J2(-i)i+j[B}ij  det  (B(i\j)) 
j= i 

Theorem  DER 

n 

= 1)*+J  [B\ij  det  (A  (i\j)) 

j= i 

Hypothesis 

n 

= E(-l)<+J'a[A]iidet(A(»|i)) 

3 = 1 

Hypothesis 

n 

= det  (A 

j — 1 

= a det  (A) 

Theorem  DER 

The  proof  for  the  case  of  a multiple  of  a column  is  entirely  similar,  or  could  be 
derived  from  an  application  of  Theorem  DT  employing  the  transpose  of  the  matrix. 


Let  us  go  for  understanding  the  effect  of  all  three  row  operations.  But  first  we 
need  an  intermediate  result,  but  it  is  an  easy  one. 

Theorem  DERC  Determinant  with  Equal  Rows  or  Columns 

Suppose  that  A is  a square  matrix  with  two  equal  rows,  or  two  equal  columns.  Then 

det  (A)  = 0. 


Proof.  Suppose  that  A is  a square  matrix  of  size  n where  the  two  rows  s and  t are 
equal.  Form  the  matrix  B by  swapping  rows  s and  t.  Notice  that  as  a consequence 
of  our  hypothesis,  A = B.  Then 

det  (A)  = ^ (det  (A)  + det  (A)) 


= 2 (det  (A)  “ det  ( B )) 

= \ (det  (A)  - det  (A)) 
1 


Theorem  DRCS 


Hypothesis,  A = B 


= 5 <°)  = n 

The  proof  for  the  case  of  two  equal  columns  is  entirely  similar,  or  could  be  derived 
from  an  application  of  Theorem  DT  employing  the  transpose  of  the  matrix.  ■ 


Now  explain  the  third  row  operation.  Here  we  go. 
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Theorem  DRCMA  Determinant  for  Row  or  Column  Multiples  and  Addition 
Suppose  that  A is  a square  matrix.  Let  B be  the  square  matrix  obtained  from  A 
by  multiplying  a row  by  the  scalar  a and  then  adding  it  to  another  row,  or  by 
multiplying  a column  by  the  scalar  a and  then  addinq  it  to  another  column.  Then 
det  (B)  = det  (A). 

Proof.  Suppose  that  A is  a square  matrix  of  size  n.  Form  the  matrix  B by  multiplying 
row  s by  a and  adding  it  to  row  t.  Let  C be  the  auxiliary  matrix  where  we  replace 
row  t of  A by  row  s of  A.  Notice  that  A (f  |j)  = B ( t\j ) = C (t\j)  for  all  1 < j < n. 
We  compute  the  determinant  of  B by  expansion  about  row  t. 

n 

det  ( B ) = ^(— l)t+"'  [B]tj  det  (B  (t|j))  Theorem  DER 

i=i 

n 

= (a  M sj  + lA)tj)  det  ( B (t\j))  Hypothesis 

3=1 

n 

= Y/hl)t+ja[A]sjdet(B(t\j)) 

3 = 1 

n 

+ J2(-l)t+>  [A]tJ  det  (B(t\j)) 

3 = 1 
n 

= o^(-l)t+MA]s.det(R(t|i)) 

3 = 1 

n 

+ J2(-l)t+>  [A]tJ  det  (B(t\j)) 

3=1 

n 

= aY,(-l)t+j[C]tj  det  (C(t\j)) 

3 = 1 

n 

+ J2(-iy^[A}tJdet(A(t\j)) 

3=1 

= a det  (C)  + det  (A)  Theorem  DER 

= a 0 + det  (A)  = det  (A)  Theorem  DERC 

The  proof  for  the  case  of  adding  a multiple  of  a column  is  entirely  similar,  or 
could  be  derived  from  an  application  of  Theorem  DT  employing  the  transpose  of 
the  matrix.  ■ 


Is  this  what  you  expected?  We  could  argue  that  the  third  row  operation  is  the 
most  popular,  and  yet  it  has  no  effect  whatsoever  on  the  determinant  of  a matrix! 
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We  can  exploit  this,  along  with  our  understanding  of  the  other  two  row  operations, 
to  provide  another  approach  to  computing  a determinant.  We’ll  explain  this  in  the 
context  of  an  example. 


Example  DRO  Determinant  by  row  operations 
Suppose  we  desire  the  determinant  of  the  4x4  matrix 


A = 


' 2 
1 

-1 

3 


0 2 
3 -1 
1 -1 

5 4 


3" 

1 

2 

0 


We  will  perform  a sequence  of  row  operations  on  this  matrix,  shooting  for  an  upper 
triangular  matrix,  whose  determinant  will  be  simply  the  product  of  its  diagonal 
entries.  For  each  row  operation,  we  will  track  the  effect  on  the  determinant  via 
Theorem  DRCS,  Theorem  DRCM,  Theorem  DRCMA. 


A = 


R± 


t Ax  = 


— 2R1+R2  a 
> A2  = 


IR1  + R3  a 
A3  = 


— 3R1+-R4  , 

? A4  — 


I-R3+-R2  , 

> A5  = 


‘ 2 0 2 3" 

13-11 
-11-12 
.3  5 4 0. 

‘ 1 3-11" 

2 0 2 3 

-11-12 
.3  5 4 0. 

" 1 3 -1  1" 

0-641 
-1  1 -12 
.3  5 4 0. 

"1  3 -1  1" 

0-641 
0 4-23 

.3  5 4 0. 

"1  3 -1  1 " 

0-6  4 1 

0 4-23 

.0  -4  7 -3. 

"1  3 -1  1 " 

0- 2  2 4 

0 4-23 

.0  -4  7 -3. 

"1  3 -1  1 " 

01- 1-2 

0 4-23 

0-47-3 


det  (A) 


= — det  (Ai) 


= - det  (A2) 


= - det  (A3) 


= — det  (A4) 


= - det  ( A5) 


= 2 det  (Ag) 


Theorem  DRCS 


Theorem  DRCMA 


Theorem  DRCMA 


Theorem  DRCMA 


Theorem  DRCMA 


Theorem  DRCM 
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— 4R2  + R3  A 
> Si  7 = 


4R2  + R4  A 
> ^8  — 


— 1R3+R4k  A _ 

> Si9  — 


— 2i?4+i?3  . 

> ^10  — 


> sill  = 


1 

55 


-R4 


> A 


12  — 


'1  3 -1  1 " 

01-1-2 
0 0 2 11 
.0  -4  7 -3. 

'13-1  1 ' 

0 1-1  -2 
0 0 2 11 
.0  0 3 -11. 

'13-1  1 " 

0 1-1  -2 
0 0 2 11 
.0  0 1 -22. 

'13-1  1 ' 

0 1-1  -2 
0 0 0 55 

.0  0 1 -22. 

'13-1  1 " 

0 1-1  -2 
0 0 1 -22 
.0  0 0 55  . 

'13-1  1 " 

0 1-1  -2 
0 0 1 -22 
0 0 0 1 


= 2 det  ( A?)  Theorem  DRCMA 


= 2 det  (Ag)  Theorem  DRCMA 


= 2 det  (Ag)  Theorem  DRCMA 


= 2 det  (A10)  Theorem  DRCMA 


= —2  det  (An)  Theorem  DRCS 


= —110  det  (A12)  Theorem  DRCM 


The  matrix  A12  is  upper  triangular,  so  expansion  about  the  first  column  (repeat- 
edly) will  result  in  det(Ai2)  = (1)(1)(1)(1)  = 1 (see  Example  DUTM)  and  thus, 
det  (A)  = -110(1)  = -110. 

Notice  that  our  sequence  of  row  operations  was  somewhat  ad  hoc , such  as  the 
transformation  to  A5.  We  could  have  been  even  more  methodical,  and  strictly 
followed  the  process  that  converts  a matrix  to  reduced  row-echelon  form  (Theorem 
REMEF),  eventually  achieving  the  same  numerical  result  with  a final  matrix  that 
equaled  the  4x4  identity  matrix.  Notice  too  that  we  could  have  stopped  with  A8, 
since  at  this  point  we  could  compute  det  (Ag)  by  two  expansions  about  first  columns, 
followed  by  a simple  determinant  of  a 2 x 2 matrix  (Theorem  DMST). 

The  beauty  of  this  approach  is  that  computationally  we  should  already  have 
written  a procedure  to  convert  matrices  to  reduced-row  echelon  form,  so  all  we 
need  to  do  is  track  the  multiplicative  changes  to  the  determinant  as  the  algorithm 
proceeds.  Further,  for  a square  matrix  of  size  n this  approach  requires  on  the  order 
of  n3  multiplications,  while  a recursive  application  of  expansion  about  a row  or 
column  (Theorem  DER,  Theorem  DEC)  will  require  in  the  vicinity  of  (n  — l)(n!) 


§PDM 


Beezer:  A First  Course  in  Linear  Algebra 


361 


multiplications.  So  even  for  very  small  matrices,  a computational  approach  utilizing 
row  operations  will  have  superior  run-time.  Tracking,  and  controlling,  the  effects  of 
round-off  errors  is  another  story,  best  saved  for  a numerical  linear  algebra  course.  A 


As  a final  preparation  for  our  two  most  important  theorems  about  determinants, 
we  prove  a handful  of  facts  about  the  interplay  of  row  operations  and  matrix 
multiplication  with  elementary  matrices  with  regard  to  the  determinant.  But  first,  a 
simple,  but  crucial,  fact  about  the  identity  matrix. 

Theorem  DIM  Determinant  of  the  Identity  Matrix 
For  every  n > 1,  det  ( In ) = 1. 

Proof.  It  may  be  overkill,  but  this  is  a good  situation  to  run  through  a proof  by 
induction  on  n (Proof  Technique  I).  Is  the  result  true  when  n = 1?  Yes, 


Now  assume  the  theorem  is  true  for  the  identity  matrix  of  size  n—  1 and  investigate 
the  determinant  of  the  identity  matrix  of  size  n with  expansion  about  row  1, 


Subsection  DROEM 

Determinants,  Row  Operations,  Elementary  Matrices 


det(/i)  = [7i]n 


Definition  DM 
Definition  IM 


= 1 


n 


(-1)1+1  [Jn]ndet(Jn(l|l)) 


n 


+ Wb  det  (!b')) 


n 


1 det  (In- i)  + ^(— 1)1+J  0 det  ( In  (l|j))  Definition  IM 


n 


id  ) + £ 0=1  Induction  Hypothesis 

i=2 


Theorem  DEM  Determinants  of  Elementary  Matrices 

For  the  three  possible  versions  of  an  elementary  matrix  (Definition  ELEM)  we  have 
the  determinants, 
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1.  det  (Eij)  = — 1 

2.  det  (Ei  ( a ))  = a 

3.  det  (Eij  (a))  = 1 

Proof.  Swapping  rows  i and  j of  the  identity  matrix  will  create  Etj  (Definition 
ELEM),  so 

det  (Eij)  = — det  (In)  Theorem  DRCS 

= — 1 Theorem  DIM 


Multiplying  row  i of  the  identity  matrix  by  a will  create  Ei  (a)  (Definition 
ELEM),  so 


det  (Ei  (a))  = a det  (/„) 
= a(l)  = a 


Theorem  DRCM 
Theorem  DIM 


Multiplying  row  i of  the  identity  matrix  by  a and  adding  to  row  j will  create 
Eij  (a)  (Definition  ELEM),  so 

det  (Eij  (a))  = det  ( I n ) Theorem  DRCM  A 

= 1 Theorem  DIM 


Theorem  DEMMM  Determinants,  Elementary  Matrices,  Matrix  Multiplication 
Suppose  that  A is  a square  matrix  of  size  n and  E is  any  elementary  matrix  of  size 
n.  Then 

det  (EA)  = det  (E)  det  (A) 

Proof.  The  proof  procedes  in  three  parts,  one  for  each  type  of  elementary  matrix, 
with  each  part  very  similar  to  the  other  two. 

First,  let  B be  the  matrix  obtained  from  A by  swapping  rows  i and  j, 

det  (Eij  A)  = det  (B)  Theorem  EMDRO 

= — det  (A)  Theorem  DRCS 

= det  (Eij)  det  (A)  Theorem  DEM 

Second,  let  B be  the  matrix  obtained  from  A by  multiplying  row  i by  a, 
det  (Ei  (a)  A)  = det  (B) 

= a det  (A) 


Theorem  EMDRO 
Theorem  DRCM 
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= det  ( Ei  ( a ))  det  ( A ) Theorem  DEM 

Third,  let  B be  the  matrix  obtained  from  A by  multiplying  row  i by  a and  adding 
to  row  j, 

det  ( Ei  j (a)  A)  = det  ( B ) Theorem  EMDRO 

= det  (A)  Theorem  DRCMA 

= det  ( (a))  det  (A)  Theorem  DEM 

Since  the  desired  result  holds  for  each  variety  of  elementary  matrix  individually, 
we  are  done.  ■ 

Subsection  DNMMM 

Determinants,  Nonsingular  Matrices,  Matrix  Multiplication 

If  you  asked  someone  with  substantial  experience  working  with  matrices  about  the 
value  of  the  determinant,  they’d  be  likely  to  quote  the  following  theorem  as  the  first 
thing  to  come  to  mind. 

Theorem  SMZD  Singular  Matrices  have  Zero  Determinants 

Let  A be  a square  matrix.  Then  A is  singular  if  and  only  if  det  (A)  = 0. 

Proof.  Rather  than  jumping  into  the  two  halves  of  the  equivalence,  we  first  establish 
a few  items.  Let  B be  the  unique  square  matrix  that  is  row-equivalent  to  A and 
in  reduced  row-echelon  form  (Theorem  REMEF,  Theorem  RREFU).  For  each  of 
the  row  operations  that  converts  B into  A,  there  is  an  elementary  matrix  Ei  which 
effects  the  row  operation  by  matrix  multiplication  (Theorem  EMDRO).  Repeated 
applications  of  Theorem  EMDRO  allow  us  to  write 

A = ESES—  i . . . E2E1B 

Then 

det  (A)  = det  (ESES_  1 . . . E2EiB) 

= det  (Es)  det  (Es_  1) . . . det  (E2)  det  (E\)  det  ( B ) Theorem  DEMMM 

From  Theorem  DEM  we  can  infer  that  the  determinant  of  an  elementary  matrix 
is  never  zero  (note  the  ban  on  a = 0 for  Ei  (a)  in  Definition  ELEM) . So  the  product 
on  the  right  is  composed  of  nonzero  scalars,  with  the  possible  exception  of  det  ( B ). 
More  precisely,  we  can  argue  that  det  (A)  = 0 if  and  only  if  det  ( B ) = 0.  With  this 
established,  we  can  take  up  the  two  halves  of  the  equivalence. 

(=>)  If  A is  singular,  then  by  Theorem  NMRRI,  B cannot  be  the  identity  matrix. 
Because  (1)  the  number  of  pivot  columns  is  equal  to  the  number  of  nonzero  rows,  (2) 
not  every  column  is  a pivot  column,  and  (3)  B is  square,  we  see  that  B must  have  a 
zero  row.  By  Theorem  DZRC  the  determinant  of  B is  zero,  and  by  the  above,  we 
conclude  that  the  determinant  of  A is  zero. 
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(<=)  We  will  prove  the  contrapositive  (Proof  Technique  CP).  So  assume  A is 
nonsingular,  then  by  Theorem  NMRRI,  B is  the  identity  matrix  and  Theorem  DIM 
tells  us  that  det  (B)  = 1^0.  With  the  argument  above,  we  conclude  that  the 
determinant  of  A is  nonzero  as  well.  ■ 

For  the  case  of  2 x 2 matrices  you  might  compare  the  application  of  Theorem 
SMZD  with  the  combination  of  the  results  stated  in  Theorem  DMST  and  Theorem 
TTMI. 

Example  ZNDAB  Zero  and  nonzero  determinant,  Archetypes  A and  B 
The  coefficient  matrix  in  Archetype  A has  a zero  determinant  (check  this!)  while  the 
coefficient  matrix  Archetype  B has  a nonzero  determinant  (check  this,  too).  These 
matrices  are  singular  and  nonsingular,  respectively.  This  is  exactly  what  Theorem 
SMZD  says,  and  continues  our  list  of  contrasts  between  these  two  archetypes.  A 

In  Section  MINM  we  said  “singular  matrices  are  a distinct  minority.”  If  you  built 
a random  matrix  and  took  its  determinant,  how  likely  would  it  be  that  you  got  zero? 

Since  Theorem  SMZD  is  an  equivalence  (Proof  Technique  E)  we  can  expand  on 
our  growing  list  of  equivalences  about  nonsingular  matrices.  The  addition  of  the 
condition  det  (A)  ^ 0 is  one  of  the  best  motivations  for  learning  about  determinants. 

Theorem  NME7  Nonsingular  Matrix  Equivalences,  Round  7 
Suppose  that  A is  a square  matrix  of  size  n.  The  following  are  equivalent. 

1.  A is  nonsingular. 

2.  A row-reduces  to  the  identity  matrix. 

3.  The  null  space  of  A contains  only  the  zero  vector,  AT  (A)  = {0}. 

4-  The  linear  system  CS(A,  b)  has  a unique  solution  for  every  possible  choice  of 

b. 

5.  The  columns  of  A are  a linearly  independent  set. 

6.  A is  invertible. 

7.  The  column  space  of  A is  C",  C(A)  = Cn. 

8.  The  columns  of  A are  a basis  for  Cn . 

9.  The  rank  of  A is  n,  r (A)  = n. 

10.  The  nullity  of  A is  zero , n(A)  = 0. 

11.  The  determinant  of  A is  nonzero,  det  (A)  ^ 0. 
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Proof.  Theorem  SMZD  says  A is  singular  if  and  only  if  det  (A)  = 0.  If  we  negate 
each  of  these  statements,  we  arrive  at  two  contrapositives  that  we  can  combine  as 
the  equivalence,  A is  nonsingular  if  and  only  if  det  {A)  ^ 0.  This  allows  us  to  add  a 
new  statement  to  the  list  found  in  Theorem  NME6.  ■ 

Computationally,  row-reducing  a matrix  is  the  most  efficient  way  to  determine 
if  a matrix  is  nonsingular,  though  the  effect  of  using  division  in  a computer  can 
lead  to  round-off  errors  that  confuse  small  quantities  with  critical  zero  quantities. 
Conceptually,  the  determinant  may  seem  the  most  efficient  way  to  determine  if  a 
matrix  is  nonsingular.  The  definition  of  a determinant  uses  just  addition,  subtraction 
and  multiplication,  so  division  is  never  a problem.  And  the  final  test  is  easy:  is  the 
determinant  zero  or  not?  However,  the  number  of  operations  involved  in  computing  a 
determinant  by  the  definition  very  quickly  becomes  so  excessive  as  to  be  impractical. 

Now  for  the  coup  de  grace.  We  will  generalize  Theorem  DEMMM  to  the  case  of  any 
two  square  matrices.  You  may  recall  thinking  that  matrix  multiplication  was  defined 
in  a needlessly  complicated  manner.  For  sure,  the  definition  of  a determinant  seems 
even  stranger.  (Though  Theorem  SMZD  might  be  forcing  you  to  reconsider.)  Read 
the  statement  of  the  next  theorem  and  contemplate  how  nicely  matrix  multiplication 
and  determinants  play  with  each  other. 

Theorem  DRMM  Determinant  Respects  Matrix  Multiplication 

Suppose  that  A and  B are  square  matrices  of  the  same  size.  Then  det  (AH)  = 

det  (A)  det  (B). 

Proof.  This  proof  is  constructed  in  two  cases.  First,  suppose  that  A is  singular. 
Then  det  (A)  = 0 by  Theorem  SMZD.  By  the  contrapositive  of  Theorem  NPNT, 
AB  is  singular  as  well.  So  by  a second  application  of  Theorem  SMZD,  det  ( AB ) = 0. 
Putting  it  all  together 

det  (AB)  =0  = 0 det  ( B ) = det  (A)  det  ( B ) 

as  desired. 

For  the  second  case,  suppose  that  A is  nonsingular.  By  Theorem  NMPEM  there 
are  elementary  matrices  E\,  E2,  E3,  . . . , Es  such  that  A = E1E2E3  . . . Es.  Then 

det  (AB)  = det  (EiE2E3  . . . ESB) 

= det  (Ei)  det  (E2)  det  (E3) . . . det  (Es)  det  (B)  Theorem  DEMMM 
= det  (E1E2E3  . . . Es)  det  (B)  Theorem  DEMMM 

= det  (A)  det  (B) 


It  is  amazing  that  matrix  multiplication  and  the  determinant  interact  this  way. 
Might  it  also  be  true  that  det  (A  + B)  = det  (A)  + det  (B)l  (Exercise  PDM.M30) 
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Reading  Questions 

1.  Consider  the  two  matrices  below,  and  suppose  you  already  have  computed  det  (A)  = —120. 
What  is  det  (A)?  Why? 


■ 0 

8 

3 

-4' 

- 0 

8 

3 

-4' 

-1 

2 

-2 

5 

B = 

0 

-4 

2 

-3 

-2 

8 

4 

3 

-2 

8 

4 

3 

0 

-4 

2 

-3 

-1 

2 

-2 

5 

2.  State  the  theorem  that  allows  us  to  make  yet  another  extension  to  our  NMEx  series  of 
theorems. 

3.  What  is  amazing  about  the  interaction  between  matrix  multiplication  and  the  determi- 
nant? 

Exercises 

C30  Each  of  the  archetypes  below  is  a system  of  equations  with  a square  coefficient 
matrix,  or  is  a square  matrix  itself.  Compute  the  determinant  of  each  matrix,  noting  how 
Theorem  SMZD  indicates  when  the  matrix  is  singular  or  nonsingular. 

Archetype  A,  Archetype  B,  Archetype  F,  Archetype  K,  Archetype  L 
M20f  Construct  a 3 x 3 nonsingular  matrix  and  call  it  A.  Then,  for  each  entry  of  the 
matrix,  compute  the  corresponding  cofactor,  and  create  a new  3x3  matrix  full  of  these 
cofactors  by  placing  the  cofactor  of  an  entry  in  the  same  location  as  the  entry  it  was  based 
on.  Once  complete,  call  this  matrix  C.  Compute  AC4.  Any  observations?  Repeat  with  a 
new  matrix,  or  perhaps  with  a 4 x 4 matrix. 

M30  Construct  an  example  to  show  that  the  following  statement  is  not  true  for  all  square 
matrices  A and  B of  the  same  size:  det  (A  + B)  = det  (A)  + det  ( B ). 

T10  Theorem  NPNT  says  that  if  the  product  of  square  matrices  AB  is  nonsingular,  then 
the  individual  matrices  A and  B are  nonsingular  also.  Construct  a new  proof  of  this  result 
making  use  of  theorems  about  determinants  of  matrices. 

T15  Use  Theorem  DRCM  to  prove  Theorem  DZRC  as  a corollary.  (See  Proof  Technique 
LC.) 

T20  Suppose  that  A is  a square  matrix  of  size  n and  a £ C is  a scalar.  Prove  that 
det  (a A)  = an  det  (A). 

T25  Employ  Theorem  DT  to  construct  the  second  half  of  the  proof  of  Theorem  DRCM 
(the  portion  about  a multiple  of  a column). 


Chapter  E 
Eigenvalues 


When  we  have  a square  matrix  of  size  n,  A,  and  we  multiply  it  by  a vector  x from  Cn 
to  form  the  matrix- vector  product  (Definition  MVP),  the  result  is  another  vector  in 
Cn.  So  we  can  adopt  a functional  view  of  this  computation  — the  act  of  multiplying 
by  a square  matrix  is  a function  that  converts  one  vector  (x)  into  another  one  (.Ax) 
of  the  same  size.  For  some  vectors,  this  seemingly  complicated  computation  is  really 
no  more  complicated  than  scalar  multiplication.  The  vectors  vary  according  to  the 
choice  of  A,  so  the  question  is  to  determine,  for  an  individual  choice  of  A,  if  there 
are  any  such  vectors,  and  if  so,  which  ones.  It  happens  in  a variety  of  situations  that 
these  vectors  (and  the  scalars  that  go  along  with  them)  are  of  special  interest. 

We  will  Ire  solving  polynomial  equations  in  this  chapter,  which  raises  the  specter  of 
complex  numbers  as  roots.  This  distinct  possibility  is  our  main  reason  for  entertaining 
the  complex  numbers  throughout  the  course.  You  might  be  moved  to  revisit  Section 
CNO  and  Section  O. 


Section  EE 

Eigenvalues  and  Eigenvectors 

In  this  section,  we  will  define  the  eigenvalues  and  eigenvectors  of  a matrix,  and  see 
how  to  compute  them.  More  theoretical  properties  will  be  taken  up  in  the  next 
section. 

Subsection  EEM 

Eigenvalues  and  Eigenvectors  of  a Matrix 

We  start  with  the  principal  definition  for  this  chapter. 

Definition  EEM  Eigenvalues  and  Eigenvectors  of  a Matrix 

Suppose  that  A is  a square  matrix  of  size  rt,  x / 0 is  a vector  in  Cn,  and  A is  a 
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scalar  in  C.  Then  we  say  x is  an  eigenvector  of  A with  eigenvalue  A if 

Ax  = Ax 


Before  going  any  further,  perhaps  we  should  convince  you  that  such  things  ever 
happen  at  all.  Understand  the  next  example,  but  do  not  concern  yourself  with  where 
the  pieces  come  from.  We  will  have  methods  soon  enough  to  be  able  to  discover 
these  eigenvectors  ourselves. 

Example  SEE  Some  eigenvalues  and  eigenvectors 
Consider  the  matrix 


and  the  vectors 

' 1 ‘ 
-1 

x 2 
5 


' 204 

98 

-26 

-10' 

-280 

-134 

36 

14 

716 

348 

-90 

-36 

-472 

-232 

60 

28 

' 204 

98 

-26 

-10' 

-280 

-134 

36 

14 

716 

348 

-90 

-36 

-472 

-232 

60 

28 

so  x is  an  eigenvector  of  A with  eigenvalue  A = 4. 
Also, 


' 204 

98 

-26 

-10' 

-280 

-134 

36 

14 

716 

348 

-90 

-36 

-472 

-232 

60 

28 

so  y is  an  eigenvector  of  A with  eigenvalue  A = 0. 
Also, 

‘ 204  98  -26  -101  [-3" 

. _ -280  -134  36  14  7 

“ 716  348  -90  -36  0 

_— 472  -232  60  28  J L 8 _ 

so  z is  an  eigenvector  of  A with  eigenvalue  A = 2. 
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Also, 


' 204 

98 

-26 

-10' 

' 1 ' 

' 2 ' 

■ i ■ 

Aw  = 

-280 

-134 

36 

14 

-1 

-2 

o 

-i 

716 

348 

-90 

-36 

4 

8 

— Z 

4 

-472 

-232 

60 

28 

0 

0 

0 

so  w is  an  eigenvector  of  A with  eigenvalue  A = 2. 

So  we  have  demonstrated  four  eigenvectors  of  A.  Are  there  more?  Yes,  any 
nonzero  scalar  multiple  of  an  eigenvector  is  again  an  eigenvector.  In  this  example, 
set  u = 30x.  Then 

Au  = A(30x) 

= 30Ax 
= 30(4x) 

= 4(30x) 

= 4u 

so  that  u is  also  an  eigenvector  of  A for  the  same  eigenvalue,  A = 4. 

The  vectors  z and  w are  both  eigenvectors  of  A for  the  same  eigenvalue  A = 2, 
yet  this  is  not  as  simple  as  the  two  vectors  just  being  scalar  multiples  of  each  other 
(they  are  not).  Look  what  happens  when  we  add  them  together,  to  form  v = z + w, 
and  multiply  by  A, 

Av  = A(z  + w) 

= Az  + Aw  Theorem  MMDAA 

= 2z  + 2w  Definition  EEM 

= 2(z  + w)  Property  DVAC 

= 2v 

so  that  v is  also  an  eigenvector  of  A for  the  eigenvalue  A = 2.  So  it  would  appear 
that  the  set  of  eigenvectors  that  are  associated  with  a fixed  eigenvalue  is  closed  under 
the  vector  space  operations  of  C".  Hmmm. 

The  vector  y is  an  eigenvector  of  A for  the  eigenvalue  A = 0,  so  we  can  use 
Theorem  ZSSM  to  write  Ay  = Oy  = 0.  But  this  also  means  that  y € Af(A).  There 
would  appear  to  be  a connection  here  also.  A 

Example  SEE  hints  at  a number  of  intriguing  properties,  and  there  are  many 
more.  We  will  explore  the  general  properties  of  eigenvalues  and  eigenvectors  in 
Section  PEE,  but  in  this  section  we  will  concern  ourselves  with  the  question  of 
actually  computing  eigenvalues  and  eigenvectors.  First  we  need  a bit  of  background 
material  on  polynomials  and  matrices. 


Theorem  MMSMM 
Definition  EEM 
Property  SMAM 
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Subsection  PM 
Polynomials  and  Matrices 

A polynomial  is  a combination  of  powers,  multiplication  by  scalar  coefficients,  and 
addition  (with  subtraction  just  being  the  inverse  of  addition).  We  never  have  occasion 
to  divide  when  computing  the  value  of  a polynomial.  So  it  is  with  matrices.  We  can 
add  and  subtract  matrices,  we  can  multiply  matrices  by  scalars,  and  we  can  form 
powers  of  square  matrices  by  repeated  applications  of  matrix  multiplication.  We  do 
not  normally  divide  matrices  (though  sometimes  we  can  multiply  by  an  inverse).  If  a 
matrix  is  square,  all  the  operations  constituting  a polynomial  will  preserve  the  size 
of  the  matrix.  So  it  is  natural  to  consider  evaluating  a polynomial  with  a matrix, 
effectively  replacing  the  variable  of  the  polynomial  by  a matrix.  We  will  demonstrate 
with  an  example. 

Example  PM  Polynomial  of  a matrix 
Let 

'-I  3 2 ' 

p(x)  = 14  + 19a;  — 3ar  — 7x3  + x4  D = 10—2 

-3  1 1 . 

and  we  will  compute  p(D) . 

First,  the  necessary  powers  of  D.  Notice  that  D°  is  defined  to  be  the  multiplicative 
identity,  I3,  as  will  be  the  case  in  general. 

'10  0 
D°  = I3=  0 1 0 
0 0 1 

-1  3 

D1  = D=  1 0 

-3  1 

■-1  3 2]  r-1  3 2 ] r-2  -1  -6' 

D2  = DD1  =10—2  10-2=5  1 0 

.-3  1 1 J L— 3 1 1 J L 1 -8  -7 

'-I  3 2]  T-2  -1  -6]  r 19  -12  — 8' 

D3  = DD2  =10-2  5 1 0 = -4  15  8 

.-3  1 1 J L 1 -8  -7J  L 12  -4  11. 

-1  3 2 j r 19  -12  -8]  [-7  49  54' 

D4  = DD3  = 1 0 -2  -4  15  8 = -5  -4  -30 

.-3  1 lj  L 12  -4  11 J L 49  47  43. 

Then 


p(D)  = 14  + 191?  - 3 D2  - 7 D3  + D4 
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1 

0 O' 

I 3 

2 ' 

-2 

-1  -6' 

14 

0 

1 0 

+ 19 

1 

0 

-2 

- 3 

5 

1 0 

0 

0 1 

3 1 

1 . 

. 1 

-8  -7 

'19 

-12  - 

-8' 

'-7 

49 

54  ' 

7 

-4 

15 

8 

+ 

-5 

-4 

-30 

12 

-4  11 

-49 

47 

43 

-139  193  166  ' 

27  -98  -124 

-193  118  20  . 

Notice  that  p(x)  factors  as 

p{x)  = 14  + 19x  — 3a;2  — 7a;3  + x4  = (x  — 2)  (a;  — 7)(x  + l)2 


Because  D commutes  with  itself  ( DD  = DD),  we  can  use  distributivity  of  matrix 
multiplication  across  matrix  addition  (Theorem  MMDAA)  without  being  careful 
with  any  of  the  matrix  products,  and  just  as  easily  evaluate  p(D)  using  the  factored 
form  of  p(x), 


p{D) 


14  + 19L>  - 3 D2  - 7 D3 


'-3 

3 2 ' 

'-8 

1 

-2  -2 

1 

-3 

1 -1 

-3 

'—139 

193 

166  ' 

27 

-98 

-124 

-193 

118 

20 

+ D4  = (D  - 2/3) (D  - 7/3) (D  + h)2 


' 0 

3 

2 ' 

1 

1 

-2 

-3 

1 

2 

This  example  is  not  meant  to  be  too  profound.  It  is  meant  to  show  you  that  it  is 
natural  to  evaluate  a polynomial  with  a matrix,  and  that  the  factored  form  of  the 
polynomial  is  as  good  as  (or  maybe  better  than)  the  expanded  form.  And  do  not 
forget  that  constant  terms  in  polynomials  are  really  multiples  of  the  identity  matrix 
when  we  are  evaluating  the  polynomial  with  a matrix.  A 


Subsection  EEE 

Existence  of  Eigenvalues  and  Eigenvectors 

Before  we  embark  on  computing  eigenvalues  and  eigenvectors,  we  will  prove  that 
every  matrix  has  at  least  one  eigenvalue  (and  an  eigenvector  to  go  with  it).  Later,  in 
Theorem  MNEM,  we  will  determine  the  maximum  number  of  eigenvalues  a matrix 
may  have. 

The  determinant  (Definition  DM)  will  be  a powerful  tool  in  Subsection  EE. CEE 
when  it  comes  time  to  compute  eigenvalues.  However,  it  is  possible,  with  some 
more  advanced  machinery,  to  compute  eigenvalues  without  ever  making  use  of  the 
determinant.  Sheldon  Axler  does  just  that  in  his  book,  Linear  Algebra  Done  Right. 
Here  and  now,  we  give  Axler’s  “determinant-free”  proof  that  every  matrix  has  an 
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eigenvalue.  The  result  is  not  too  startling,  but  the  proof  is  most  enjoyable. 

Theorem  EMHE  Every  Matrix  Has  an  Eigenvalue 

Suppose  A is  a square  matrix.  Then  A has  at  least  one  eigenvalue. 

Proof.  Suppose  that  A has  size  n,  and  choose  x as  any  nonzero  vector  from  Cn. 
(Notice  how  much  latitude  we  have  in  our  choice  of  x.  Only  the  zero  vector  is 
off-limits.)  Consider  the  set 

S = {x,  Ax,  A2x,  A3x,  . . . , A"x} 

This  is  a set  of  n + 1 vectors  from  Cra,  so  by  Theorem  MVSLD,  S is  linearly 
dependent.  Let  a0,  a3,  a2,  . . . , an  be  a collection  of  n + 1 scalars  from  C,  not  all 
zero,  that  provide  a relation  of  linear  dependence  on  S.  In  other  words, 

aox  + aiAx  + a2A2x  + aaA3x  + • • • + anAnx  = 0 

Some  of  the  a*  are  nonzero.  Suppose  that  just  a3  ^ 0,  and  ai  = a2  = <23  = • • • = 
an  = 0.  Then  00X  = 0 and  by  Theorem  SMEZV,  either  a0  = 0 or  x = 0,  which  are 
both  contradictions.  So  a*  ^ 0 for  some  i > 1.  Let  to  be  the  largest  integer  such 
that  am  7^  0.  From  this  discussion  we  know  that  to  > 1.  We  can  also  assume  that 
am  = 1,  for  if  not,  replace  each  ai  by  ai/am  to  obtain  scalars  that  serve  equally  well 
in  providing  a relation  of  linear  dependence  on  S. 

Define  the  polynomial 

p(x)  = a0  + a\X  + a2x2  + a3x3  -| + amxm 

Because  we  have  consistently  used  C as  our  set  of  scalars  (rather  than  R),  we 
know  that  we  can  factor  p(x)  into  linear  factors  of  the  form  (x  — bf)1  where  bi  £ C. 
So  there  are  scalars,  b±,  b2,  b3,  . . . , bm , from  C so  that, 

p(x)  = {x-  b.m)(x  - 6m_  1)  • • • (x  - b3)(x  - b2)(x  - bx) 

Put  it  all  together  and 
0 = aox  + aiAx  + a2A2x  + • • • + anAnx 
= aox  + aiAx  + a2A2x  H — • + amAmx  a.j  = 0 for  i>  m 

= ( a3In  + aiA  + a2A2  + • • • + amAm)  x Theorem  MMDAA 

= p(A)x  Definition  of  p(x ) 

= (A  bmIn)(A  bm— \In) ' ' ' (A  b2In)(A  fq/n)x 
Let  k be  the  smallest  integer  such  that 

(A  - bkIn)(A  - • • • (A  - b2In)(A  - bi In)x  = 0. 

From  the  preceding  equation,  we  know  that  k < m.  Define  the  vector  z by 
Z = (A  - bk-iln)  ■■■( A - b2In){A  - b\In)x 
Notice  that  by  the  definition  of  k , the  vector  z must  be  nonzero.  In  the  case 
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where  k = 1,  we  understand  that  z is  defined  by  z = x,  and  z is  still  nonzero.  Now 
( A - bkIn) z = (A  - bkIn){A  - bk-iln)  • • • ( A - b3In)(A  - b2In)(A  - 6il„)x  = 0 
which  allows  us  to  write 


Az  = (A  + O)  z 

= {A  bkIn  fi-  bkIn) z 
= (A  bkIn)  z + bkIn  z 

= 0 -f~  bkIn z 
= 

= bkz 


Property  ZM 
Property  AIM 
Theorem  MMDAA 
Defining  property  of  z 
Property  ZM 
Theorem  MMIM 


Since  z ^ 0,  this  equation  says  that  z is  an  eigenvector  of  A for  the  eigenvalue 
A = bk  (Definition  EEM),  so  we  have  shown  that  any  square  matrix  A does  have  at 
least  one  eigenvalue.  ■ 


The  proof  of  Theorem  EMHE  is  constructive  (it  contains  an  unambiguous 
procedure  that  leads  to  an  eigenvalue),  but  it  is  not  meant  to  be  practical.  We  will 
illustrate  the  theorem  with  an  example,  the  purpose  being  to  provide  a companion 
for  studying  the  proof  and  not  to  suggest  this  is  the  best  procedure  for  computing 
an  eigenvalue. 

Example  CAEHW  Computing  an  eigenvalue  the  hard  way 
This  example  illustrates  the  proof  of  Theorem  EMHE,  and  so  will  employ  the  same 
notation  as  the  proof  — look  there  for  full  explanations.  It  is  not  meant  to  be 
an  example  of  a reasonable  computational  approach  to  finding  eigenvalues  and 
eigenvectors.  OK,  warnings  in  place,  here  we  go. 

Consider  the  matrix  A,  and  choose  the  vector  x, 


' -7 

-1 

11 

0 

-4' 

' 3 ' 

4 

1 

0 

2 

0 

0 

-10 

-1 

14 

0 

-4 

X = 

3 

8 

2 

-15 

-1 

5 

-5 

-10 

-1 

16 

0 

-6 

4 

It  is  important  to  notice  that  the  choice  of  x could  be  anything , so  long  as  it  is 
not  the  zero  vector.  We  have  not  chosen  x totally  at  random,  but  so  as  to  make  our 
illustration  of  the  theorem  as  general  as  possible.  You  could  replicate  this  example 
with  your  own  choice  and  the  computations  are  guaranteed  to  be  reasonable,  provided 
you  have  a computational  tool  that  will  factor  a fifth  degree  polynomial  for  you. 

The  set 


S = {x,  Ax,  A2x,  A3x,  A4x,  Aax} 


§EE 


Beezer:  A First  Course  in  Linear  Algebra 


374 


' 3 ' 

'-4' 

' 6 ' 

'-10' 

' 18  ' 

'-34' 

0 

2 

-6 

14 

-30 

62 

3 

5 

-4 

5 

6 

1 

-10 

■) 

18 

5 

-34 

-5 

4 

-2 

-2 

10 

-26 

4 

-6 

10 

-18 

34 

-66 

is  guaranteed  to  be  linearly  dependent,  as  it  has  six  vectors  from  C5  (Theorem 
MVSLD). 

We  will  search  for  a nontrivial  relation  of  linear  dependence  by  solving  a homoge- 
neous system  of  equations  whose  coefficient  matrix  has  the  vectors  of  S as  columns 
through  row  operations, 


3 

-4 

6 

-10 

18 

-34' 

0 

2 

-6 

14 

-30 

62 

3 

-4 

6 

-10 

18 

-34 

-5 

4 

-2 

-2 

10 

-26 

4 

-6 

10 

-18 

34 

-66 

RREF 




0 

0 

0 

0 


0 

0 

0 

0 

0 


-2 

-3 

0 

0 

0 


6 

7 

0 

0 

0 


-14 

-15 

0 

0 

0 


30 

31 
0 
0 
0 


There  are  four  free  variables  for  describing  solutions  to  this  homogeneous  system, 
so  we  have  our  pick  of  solutions.  The  most  expedient  choice  would  be  to  set  x3  = 1 
and  X4  = X5  = Xq  = 0.  However,  we  will  again  opt  to  maximize  the  generality  of  our 
illustration  of  Theorem  EMHE  and  choose  x3  = —8,  X4  = —3,  £5  = 1 and  x6  = 0. 
This  leads  to  a solution  with  x-\  = 16  and  x-2  = 12. 

This  relation  of  linear  dependence  then  says  that 

0 = 16x  + 12Ax  - 8A2x  - 3A3x  + A4x  + 0A5x 
0 = (16  + 12 A - 8A2  - 3A3  + A4)  x 

So  we  define  p(x)  = 16  + 12a;  — 8a;2  — 3x3  + x 4,  and  as  advertised  in  the  proof  of 
Theorem  EMHE,  we  have  a polynomial  of  degree  m = 4 > 1 such  that  p(A)x  = 0. 
Now  we  need  to  factor  p(x)  over  C.  If  you  made  your  own  choice  of  x at  the  start, 
this  is  where  you  might  have  a fifth  degree  polynomial,  and  where  you  might  need 
to  use  a computational  tool  to  find  roots  and  factors.  We  have 

p(x)  = 16  + 12a;  — 8a;2  — 3a;3  + a;4  = (x  — 4)  (x  + 2)(x  — 2)(x  + 1) 


So  we  know  that 


0 = p(A)x  = (A  - 4/5) (A  + 2J5)(A  - 2J5) (A  + l/5)x 

We  apply  one  factor  at  a time,  until  we  get  the  zero  vector,  so  as  to  determine 
the  value  of  k described  in  the  proof  of  Theorem  EMHE, 


'-6  -1  11  0 -4' 

' 3 ' 

'-r 

4 2 0 2 0 

0 

2 

-10  -1  15  0 -4 

3 

= 

-1 

8 2 -15  0 5 

-5 

-1 

-10  -1  16  0 -5 

4 

-2 

(A  + l/5)x  = 
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(A  — 2/5)(A  + l/5)x  — 


(A  + 2/5)(A-2I5)(A  + 1I5)x  = 


' -9 

-1 

11 

0 

— 

41 

r 

4 ' 

4 

-1 

0 

2 

0 

2 

-8 

-10 

-1 

12 

0 

— 

4 

— 

i 

4 

8 

2 

-15 

-3 

5 

— 

i 

4 

-10 

-1 

16 

0 

- 

8_ 

_- 

2 

8 

' -5 

-1 

11 

0 

-4' 

4 ' 

'0 

4 

3 

0 

2 

0 

-8 

0 

-10 

-1 

16 

0 

-4 

4 

0 

8 

2 

-15 

1 

5 

4 

0 

-10 

-1 

16 

0 

-4 

8 

0 

So  k = 3 and 


4 


z=  (A-2/5)(A+lI5)x  = 


4 

4 

8 


is  an  eigenvector  of  A for  the  eigenvalue  A = —2,  as  you  can  check  by  doing  the 
computation  Az.  If  you  work  through  this  example  with  your  own  choice  of  the 
vector  x (strongly  recommended)  then  the  eigenvalue  you  will  find  may  be  different, 
but  will  be  in  the  set  {3,  0,  1,  —1,  —2}.  See  Exercise  EE.M60  for  a suggested  starting 
vector.  A 


Subsection  CEE 

Computing  Eigenvalues  and  Eigenvectors 

Fortunately,  we  need  not  rely  on  the  procedure  of  Theorem  EMHE  each  time  we 
need  an  eigenvalue.  It  is  the  determinant,  and  specifically  Theorem  SMZD,  that 
provides  the  main  tool  for  computing  eigenvalues.  Here  is  an  informal  sequence  of 
equivalences  that  is  the  key  to  determining  the  eigenvalues  and  eigenvectors  of  a 
matrix, 

Ax  = Ax  <$=>•  Ax  — A I„x  = 0 -<=>■  (A  — A In)  x = 0 

So,  for  an  eigenvalue  A and  associated  eigenvector  x/0,  the  vector  x will  be 
a nonzero  element  of  the  null  space  of  A — A while  the  matrix  A — A In  will 
be  singular  and  therefore  have  zero  determinant.  These  ideas  are  made  precise  in 
Theorem  EMRCP  and  Theorem  EMNS,  but  for  now  this  brief  discussion  should 
suffice  as  motivation  for  the  following  definition  and  example. 

Definition  CP  Characteristic  Polynomial 

Suppose  that  A is  a square  matrix  of  size  n.  Then  the  characteristic  polynomial 


§EE 


Beezer:  A First  Course  in  Linear  Algebra 


376 


of  A is  the  polynomial  pa  (x)  defined  by 

Pa  (x)  = det  (A  - xln) 


Example  CPMS3  Characteristic  polynomial  of  a matrix,  size  3 
Consider 

'—13  -8  -41 

F = 


□ 


12  7 4 

24  16  7 


Then 

Pf  (x)  = det  (F  — xls) 


12 

24 


-8 

-4 

7-x 

4 

16 

7-x 

7 — x 

4 

16 

7 — x 

12  7- 

X 

= (-13-*)  V 7 — 'T  +(-8)(-l) 
+ (_4)  24  16 

= (-13  - z)((7  - x){7  -x)-  4(16)) 

+ ( 8)(  1)  (12(7  — x)  — 4(24)) 

+ (— 4)(12(16)  — (7  — *)(24)) 

= 3 + 5a;  + x2  — x3 
= —(x  - 3) (a;  + l)2 


12  4 

24  7 -x 


Definition  CP 
Definition  DM 

Theorem  DMST 


A 


The  characteristic  polynomial  is  our  main  computational  tool  for  finding  eigenval- 
ues, and  will  sometimes  be  used  to  aid  us  in  determining  the  properties  of  eigenvalues. 

Theorem  EMRCP  Eigenvalues  of  a Matrix  are  Roots  of  Characteristic  Polynomials 
Suppose  A is  a square  matrix.  Then  A is  an  eigenvalue  of  A if  and  only  if  pa  (A)  = 0. 


Proof.  Suppose  A has  size  n. 


A is  an  eigenvalue  of  A 
-<=>•  there  exists  x ^ 0 so  that 
-<=>•  there  exists  x ^ 0 so  that 
<£=>  there  exists  x ^ 0 so  that 
<*=>  there  exists  x ^ 0 so  that 
<*=>  A — A In  is  singular 


Ax  = Ax 
Ax  — Ax  = 0 
Ax  — A/nx  = 0 
(A  - A 7„)x  = 0 


Definition  EEM 
Property  AIC 
Theorem  MMIM 
Theorem  MMDAA 
Definition  NM 
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det  ( A — A In)  = 0 
Pa  (A)  = 0 


Theorem  SMZD 
Definition  CP 


Example  EMS3  Eigenvalues  of  a matrix,  size  3 
In  Example  CPMS3  we  found  the  characteristic  polynomial  of 


r— 13 


F = 


12 

24 


-8 

7 

16 


-4" 

4 

7 


to  be  pf  (x)  = — ( x — 3)(x  + l)2.  Factored,  we  can  find  all  of  its  roots  easily,  they  are 
x = 3 and  x = — 1.  By  Theorem  EMRCP,  A = 3 and  A = — 1 are  both  eigenvalues  of 
F,  and  these  are  the  only  eigenvalues  of  F.  We  have  found  them  all.  A 


Let  us  now  turn  our  attention  to  the  computation  of  eigenvectors. 


Definition  EM  Eigenspace  of  a Matrix 

Suppose  that  A is  a square  matrix  and  A is  an  eigenvalue  of  A.  Then  the  eigenspace 
of  A for  A,  Sa  (A),  is  the  set  of  all  the  eigenvectors  of  A for  A,  together  with  the 
inclusion  of  the  zero  vector.  □ 


Example  SEE  hinted  that  the  set  of  eigenvectors  for  a single  eigenvalue  might 
have  some  closure  properties,  and  with  the  addition  of  the  one  eigenvector  that  is 
never  an  eigenvector,  0,  we  indeed  get  a whole  subspace. 

Theorem  EMS  Eigenspace  for  a Matrix  is  a Subspace 

Suppose  A is  a square  matrix  of  size  n and  A is  an  eigenvalue  of  A.  Then  the 
eigenspace  £a  (A)  is  a subspace  of  the  vector  space  Cn. 


Proof.  We  will  check  the  three  conditions  of  Theorem  TSS.  First,  Definition  EM 
explicitly  includes  the  zero  vector  in  £a  (A),  so  the  set  is  nonempty. 

Suppose  that  x,  y G £a  (A),  that  is,  x and  y are  two  eigenvectors  of  A for  A. 
Then 

A (x  + y)  = Ax  + Ay  Theorem  MMDAA 

= Ax  + Ay  Definition  EEM 

= A (x  + y)  Property  DVAC 

So  either  x + y = 0,  or  x + y is  an  eigenvector  of  A for  A (Definition  EEM).  So, 
in  either  event,  x + y€f^  (A),  and  we  have  additive  closure. 

Suppose  that  a £ C,  and  that  x€  A(A),  that  is,  x is  an  eigenvector  of  A for  A. 
Then 


A (ax)  = a (Ax) 
= aAx 
= A (ax) 


Theorem  MMSMM 
Definition  EEM 
Property  SMAC 
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So  either  ax  = 0,  or  ax  is  an  eigenvector  of  A for  A (Definition  EEM).  So,  in 
either  event,  ax  £ Ea  (A),  and  we  have  scalar  closure. 

With  the  three  conditions  of  Theorem  TSS  met,  we  know  Ea  (A)  is  a subspace. ■ 

Theorem  EMS  tells  us  that  an  eigenspace  is  a subspace  (and  hence  a vector  space 
in  its  own  right).  Our  next  theorem  tells  us  how  to  quickly  construct  this  subspace. 

Theorem  EMNS  Eigenspace  of  a Matrix  is  a Null  Space 

Suppose  A is  a square  matrix  of  size  n and  A is  an  eigenvalue  of  A.  Then 

Ea  (A)  = A r(A  - A /„) 


Proof.  The  conclusion  of  this  theorem  is  an  equality  of  sets,  so  normally  we  would 
follow  the  advice  of  Definition  SE.  However,  in  this  case  we  can  construct  a sequence 
of  equivalences  which  will  together  provide  the  two  subset  inclusions  we  need.  First, 
notice  that  0 £ Ea  (A)  by  Definition  EM  and  0 £ J\f(A  — A /„)  by  Theorem  HSC. 
Now  consider  any  nonzero  vector  x £ Cn, 


xe^A  (A) 


Ax  = Ax 
Ax  — Ax  = 0 
Ax  — A Inx  = 0 
(A  - A In)  x = 0 
X e A f{A  - A In) 


Definition  EM 
Property  AIC 
Theorem  MMIM 
Theorem  MMDAA 
Definition  NSM 


You  might  notice  the  close  parallels  (and  differences)  between  the  proofs  of 
Theorem  EMRCP  and  Theorem  EMNS.  Since  Theorem  EMNS  describes  the  set 
of  all  the  eigenvectors  of  A as  a null  space  we  can  use  techniques  such  as  Theorem 
BNS  to  provide  concise  descriptions  of  eigenspaces.  Theorem  EMNS  also  provides  a 
trivial  proof  for  Theorem  EMS. 


Example  ESMS3  Eigenspaces  of  a matrix,  size  3 

Example  CPMS3  and  Example  EMS3  describe  the  characteristic  polynomial  and 
eigenvalues  of  the  3x3  matrix 


r— 13 


F = 


12 

24 


-8  -4" 
7 4 

16  7 


We  will  now  take  each  eigenvalue  in  turn  and  compute  its  eigenspace.  To  do  this, 
we  row- reduce  the  matrix  F — A/3  in  order  to  determine  solutions  to  the  homogeneous 
system  CS{F  — A/3,  0)  and  then  express  the  eigenspace  as  the  null  space  of  F — A/3 
(Theorem  EMNS).  Theorem  BNS  then  tells  us  how  to  write  the  null  space  as  the 
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span  of  a basis. 


-16 

-8 

II 

CO 

1 

12 

4 

24 

16 

£F(3)  =A f{F- 

3/3) 

-12 

-8 

F + 1I3  = 

12 

8 

24 

16 

£f(-1)=A f{F  + 

I/3) 

Eigenspaces  in  hand,  we  can  easily  compute  eigenvectors  by  forming  nontrivial 
linear  combinations  of  the  basis  vectors  describing  each  eigenspace.  In  particular, 
notice  that  we  can  “pretty  up”  our  basis  vectors  by  using  scalar  multiples  to  clear 
out  fractions.  A 


Subsection  ECEE 

Examples  of  Computing  Eigenvalues  and  Eigenvectors 

There  are  no  theorems  in  this  section,  just  a selection  of  examples  meant  to  illustrate 
the  range  of  possibilities  for  the  eigenvalues  and  eigenvectors  of  a matrix.  These 
examples  can  all  be  done  by  hand,  though  the  computation  of  the  characteristic 
polynomial  would  be  very  time-consuming  and  error-prone.  It  can  also  be  difficult 
to  factor  an  arbitrary  polynomial,  though  if  we  were  to  suggest  that  most  of  our 
eigenvalues  are  going  to  be  integers,  then  it  can  be  easier  to  hunt  for  roots.  These 
examples  are  meant  to  look  similar  to  a concatenation  of  Example  CPMS3,  Example 
EMS3  and  Example  ESMS3.  First,  we  will  sneak  in  a pair  of  definitions  so  we  can 
illustrate  them  throughout  this  sequence  of  examples. 

Definition  AME  Algebraic  Multiplicity  of  an  Eigenvalue 

Suppose  that  A is  a square  matrix  and  A is  an  eigenvalue  of  A.  Then  the  algebraic 
multiplicity  of  A,  a a (A),  is  the  highest  power  of  (x— A)  that  divides  the  characteristic 
polynomial,  pa  (x).  □ 

Since  an  eigenvalue  A is  a root  of  the  characteristic  polynomial,  there  is  always 
a factor  of  (x  — A),  and  the  algebraic  multiplicity  is  just  the  power  of  this  factor 
in  a factorization  of  pa  (x).  So  in  particular,  a a (A)  > 1.  Compare  the  definition  of 
algebraic  multiplicity  with  the  next  definition. 

Definition  GME  Geometric  Multiplicity  of  an  Eigenvalue 
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Suppose  that  A is  a square  matrix  and  A is  an  eigenvalue  of  A.  Then  the  geometric 
multiplicity  of  A,  74  (A),  is  the  dimension  of  the  eigenspace  £4  (A).  □ 

Every  eigenvalue  must  have  at  least  one  eigenvector,  so  the  associated  eigenspace 
cannot  be  trivial,  and  so  74  (A)  > 1. 


Example  EMMS4  Eigenvalue  multiplicities,  matrix  of  size  4 
Consider  the  matrix 


B = 


'-2 

12 

6 

3 


1 

1 

5 

-4 


-2  — 4" 

4 9 

-2  -4 

5 10 


then 


Pb  {x)  = 8 — 20x  + 18a;2  — 7x3  + x4  = (x  — l)(a;  — 2)3 

So  the  eigenvalues  are  A = 1,  2 with  algebraic  multiplicities  as  (1)  = 1 and  a#  (2)  = 
3. 

Computing  eigenvectors, 


A = 1 B — 1J4  = 


r-3 

1 

-2 

-41 

[0 

0 

1 

3 

0 ' 

12 

0 

4 

9 

RREF 

0 

0 

-1 

0 

6 

5 

-3 

-4 

0 

0 

0 

0 

L 3 

-4 

5 

9 J 

. 0 

0 

0 

0 . 

Eb  (1)  = A r(B  - 1/4)  = 


_ 1 
3 

1 

1 

0 


A = 2 B-2h  = 


-4  1 

12  -1 
6 5 

3 -4 


—2  —4' 

4 9 
—4  —4 

5 8 


Eb  (2)  = A r(B  - 2/4)  = 


RREF 


-I 

3 

3 

0 


■0  0 0 1/2' 
0^0-1 


0 0 1/2 

0 0 0 . 

-1 

2 

-1 

2 


So  each  eigenspace  has  dimension  1 and  so  7 b (1)  = 1 and  7 b (2)  = 1.  This 
example  is  of  interest  because  of  the  discrepancy  between  the  two  multiplicities  for 
A = 2.  In  many  of  our  examples  the  algebraic  and  geometric  multiplicities  will  be 
equal  for  all  of  the  eigenvalues  (as  it  was  for  A = 1 in  this  example),  so  keep  this 
example  in  mind.  We  will  have  some  explanations  for  this  phenomenon  later  (see 
Example  NDMS4).  A 
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Example  ESMS4  Eigenvalues,  symmetric  matrix  of  size  4 
Consider  the  matrix 


then 


C = 


'1 

0 

1 

1 


0 1 
1 1 
1 1 
1 0 


r 

l 

0 

1 


Pc  (x)  = — 3 + 4x  + 2x2  — 4x3  + xA  = (x  — 3)(cc  — l)2(cc  + 1) 


So  the  eigenvalues  are  A = 3,  1,  —1  with  algebraic  multiplicities  qc  (3)  = 1,  ac  (1)  = 
2 and  etc  (— 1)  = 1. 

Computing  eigenvectors, 


A = 3 


A = 1 


A = — 1 


C - 3/4  = 


-2  0 1 1 ' 
0-211 
11-20 
110-2 


RREF 


[0  0 0 

0 0 0 

0 0 0 

0 0 0 


ro 

0 

1 

i' 

[0 

1 

0 

0" 

0 

0 

1 

1 

RREF 

0 

0 

0 

1 

1 

1 

0 

0 

0 

0 

0 

0 

|_i 

1 

0 

oj 

0 

0 

0 

0 

Sc  (3)  = A r[C  - 3/4)  = 


C - 1/4  = 


Sc{l)=M(C-lh)  = 


C + 1/4  — 


£c  (-1)  = Af{C  + lh)  = 


-r 

1 

0 

0 


0 ' 

0 

-1 

1 


[2 

0 

1 

r 

[0 

0 

0 

1 ' 

0 

2 

1 

1 

RREF 

0 

0 

0 

1 

1 

1 

1 

1 

2 

0 

0 

2 

1 

O O 

0 

0 

0 

0 

-1 
0 . 

-r 

-1 

1 

1 


-1 

-1 

-1 

0 


So  the  eigenspace  dimensions  yield  geometric  multiplicities  7 c (3)  = 1,  7 c (1)  = 2 
and  7 c (—1)  = 1,  the  same  as  for  the  algebraic  multiplicities.  This  example  is  of 
interest  because  A is  a symmetric  matrix,  and  will  be  the  subject  of  Theorem  HMRE. 
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Example  HMEM5  High  multiplicity  eigenvalues,  matrix  of  size  5 
Consider  the  matrix 


' 29 

14 

2 

6 

— 9" 

-47 

-22 

-1 

-11 

13 

19 

10 

5 

4 

-8 

-19 

-10 

-3 

-2 

8 

7 

4 

3 

1 

-3 

Pe  ( x ) = —16  + 16a;  + 8x2  — 16a;3  + 7a;4  — x5  = — (x  — 2)4(a;  - 

So  the  eigenvalues  are  A = 2,-1  with  algebraic  multiplicities  «£ 
aE  (—1)  = 1- 

Computing  eigenvectors, 


' 27 

14 

2 

6 

-9' 

-47 

-24 

-1 

-11 

13 

19 

10 

3 

4 

-8 

-19 

-10 

-3 

-4 

8 

7 

4 

3 

1 

-5 

£e  (2)  = N{E  - 2J5)  = 

A = -1 

r 30  14 

-47  -21  - 


30 

14 

2 

6 

— 9" 

-47 

-21 

-1 

-11 

13 

19 

10 

6 

4 

-8 

-19 

-10 

-3 

-1 

8 

7 

4 

3 

1 

-2 

£E(-l)=Af(E  + lI5)  = 


0 0 0 
0 0 0 
0 0 0 
0 0 0 
0 0 0 


■0  0 0 

0 0 0 

0 0 0 

0 0 0 

0 0 0 


So  the  eigenspace  dimensions  yield  geometric  multiplicities  (2)  = 2 and 
7 e (—1)  = 1.  This  example  is  of  interest  because  A = 2 has  such  a large  algebraic 
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multiplicity,  which  is  also  not  equal  to  its  geometric  multiplicity.  A 

Example  CEMS6  Complex  eigenvalues,  matrix  of  size  6 
Consider  the  matrix 


' -59 

-34 

41 

12 

25 

30  ' 

1 

7 

-46 

-36 

-11 

-29 

-233 

-119 

58 

-35 

75 

54 

157 

81 

-43 

21 

-51 

-39 

-91 

-48 

32 

-5 

32 

26 

209 

107 

-55 

28 

-69 

-50 

then 

Pf  (x)  = —50  + 55a;  + 13a;2  — 50a;3  + 32a:4  — 9a:5  + x 6 
= (a;  — 2)(x  + l)(x2  — 4a;  + 5)2 
= {x  - 2)(x  + l)((x  - (2  + i)){x  - (2  - i)))2 
= (x  - 2)(x  + l)(x  - (2  + z))2(x  - (2  - i ))2 


So  the  eigenvalues  are  A = 2,  —1,  2 + i,  2 — i with  algebraic  multiplicities  ap  (2)  = 1, 
ot-F  (— 1)  = 1,  cxf  (2  + i)  = 2 and  aF  (2  — i)  = 2. 

We  compute  eigenvectors,  noting  that  the  last  two  basis  vectors  are  each  a scalar 
multiple  of  what  Theorem  BNS  will  provide, 


A = 2 


F-2I«  = 


[ -61 

-34 

41 

12 

25 

30  1 

[E 

0 

0 

0 

0 

1 

5 

1 

5 

-46 

-36 

-11 

-29 

0 

0 

0 

0 

0 

0 

-233 

-119 

56 

-35 

75 

54 

RREF 

0 

0 

0 

0 

0 

3 

157 

81 

-43 

19 

-51 

-39 

0 

0 

0 

0 

0 

0 

0 

_ 1 

-91 

209 

-48 

107 

32 

-55 

-5 

28 

30 

-69 

r It 

26 

-52 

Til 

0 

0 

0 

0 

0 

0 

0 

0 

4 

5 

0 _ 

£F(2)=A7(E-2/6)  = ( 


= 


0 

-3 

1 

-4 

5 


F + \Iq  — 


A = -1 
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' —58 

-34 

41 

12 

25 

30 

1 

8 

-46 

-36 

-11 

-29 

-233 

-119 

59 

-35 

75 

54 

157 

81 

-43 

22 

-51 

-39 

-91 

-48 

32 

-5 

33 

26 

209 

107 

-55 

28 

-69 

-49 

£f  (-1)  = Af(F  + 16)  = 


0 0 0 0 0 
0 0 0 0 0 
0 0 0 0 0 
0 0 0 0 0 
0 0 0 0 0 
0 0 0 0 0 


A — 2 + i 


F - (2  + i)h  = 


-61  - 

i 

-34 

41  1 

2 

25 

30 

1 

5 -i 

-46 

36 

-11 

-29 

-233 

-119 

56  — * - 

35 

75 

54 

157 

81 

-43  19 

— i 

-51 

-39 

-91 

-48 

32 

-5 

•<s> 

1 

O 

00 

26 

209 

107 

-55  2 

>8 

-69 

1 

Ox 

to 

1 

<s> . 

0 

0 

0 

|(7  + *) 

0 

0 

0 

I (—9-2*) 

0 

0 

0 

1 

0 

0 

0 

-1 

0 

0 

0 

1 

0 

0 

0 

0 

SF  (2  + i)  = U(F  - (2  + i) I6)  = 


'-7  - 1 
9 A 2 i 
-5 
5 

-5 

5 


A = 2 -i 
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—61  + 

i 

-34 

41 

12 

25 

30 

1 

5 + i 

-46 

— 

36 

-11 

-29 

do 

i)h 

-233 

-119 

56  + * 



35 

75 

54 

— 

157 

81 

-43 

19 

+ i 

-51 

-39 

-91 

-48 

32 

-5 

30  + * 

26 

209 

107 

-55 

28 

-69 

-52  + i 

P 

0 

0 

0 

0 

K7' 

i) 

0 

E 

0 

0 

0 

K-9  + 

2 i) 

RREF 
— ---> 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

-1 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

£f  (2  - i)  = A f(F  - (2  - i)I6) 


'-7  + z 

9-2  i 

-5 

5 

-5 

< 

5 

> 

Eigenspace  dimensions  yield  geometric  multiplicities  of  7 f (2)  = 1,  7 f (— 1)  = 
1,  7j?  (2  + i)  = 1 and  7^  (2  — i)  = 1.  This  example  demonstrates  some  of  the 
possibilities  for  the  appearance  of  complex  eigenvalues,  even  when  all  the  entries 
of  the  matrix  are  real.  Notice  how  all  the  numbers  in  the  analysis  of  A = 2 — * are 
conjugates  of  the  corresponding  number  in  the  analysis  of  A = 2 + i.  This  is  the 
content  of  the  upcoming  Theorem  ERMCP.  A 


Example  DEMS5  Distinct  eigenvalues,  matrix  of  size  5 
Consider  the  matrix 


' 15 

18 

-8 

6 

-5' 

5 

3 

1 

-1 

-3 

H = 

0 

-4 

5 

-4 

-2 

-43 

-46 

17 

-14 

15 

26 

30 

-12 

8 

-10 

then 


Ph  (x)  = —6x  + x2  + 7x3  — xA  — x5  = x(x  — 2)(x  — l)(x  + l)(x  + 3) 


So  the  eigenvalues  are  A = 2,  1,  0,  — 1,  —3  with  algebraic  multiplicities  a#  (2)  = 1, 
aH  (1)  = 1,  aH  (0)  = 1,  aH  (-1)  = 1 and  aH  (-3)  = 1. 

Computing  eigenvectors, 


A = 2 
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' 13 

18 

-8 

6 

-5" 

r0 

0 

0 

0 

-1' 

5 

1 

1 

-1 

-3 

0 

0 

0 

0 

1 

0 

-4 

3 

-4 

-2 

RREF 

s> 

0 

0 

0 

0 

2 

-43 

-46 

17 

-16 

15 

0 

0 

0 

0 

1 

26 

30 

-12 

8 

-12 

. 0 

0 

0 

0 

0 . 

Eh  (2)  = A f(H  - 2/5) 


A = 1 


' 14 

18 

-8 

6 

-5" 

r0 

0 

0 

0 

1- 

2 

5 

2 

1 

-1 

-3 

0 

0 

0 

0 

0 

0 

-4 

4 

-4 

-2 

RREF 
S> 

0 

0 

0 

0 

1 

2 

-43 

-46 

17 

-15 

15 

0 

0 

0 

0 

1 

26 

30 

-12 

8 

-11 

. 0 

0 

0 

0 

0 . 

A = 0 


' 15 

18 

-8 

6 

-5" 

r0 

0 

0 

0 

1 ■ 

5 

3 

1 

-1 

-3 

0 

0 

0 

0 

-2 

0 

-4 

5 

-4 

-2 

RREF 

y 

0 

0 

0 

0 

-2 

-43 

-46 

17 

-14 

15 

0 

0 

0 

0 

0 

26 

30 

-12 

8 

-10 

. 0 

0 

0 

0 

0 . 

Eh  (0)  = A f(H  - 0I5) 


A = -1 
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r i6 

18 

-8 

6 

-51 

10] 

0 

0 

0 

5 

4 

1 

-1 

-3 

0 

0 

0 

0 

0 

-4 

6 

-4 

-2 

RREF 
> 

0 

0 

0 

0 

-43 

—46 

17 

-13 

15 

0 

0 

0 

0 

L 26 

30 

-12 

8 

— 9J 

0 

0 

0 

0 

£H{-l)=N{H  + lh) 


-1/2' 

0 

0 

1/2 

0 


A = — 3 


H + 3/5 


18 

18 

-8 

6 

-5' 

5 

6 

1 

-1 

-3 

0 

-4 

8 

-4 

-2 

-43 

-46 

17 

-11 

15 

26 

30 

-12 

8 

-7 

RREF 
> 


rm 

0 

0 

0 

. 0 


0 

0 

0 

0 

0 


0 0 -1' 

0 0 1 

0 0 1 

0 0 2 

0 0 0 j 


Eh  (-3)  = A f(H  + 3/5) 


So  the  eigenspace  dimensions  yield  geometric  multiplicities  7#  (2)  = 1,  7 h (1)  = 1, 
7 H (0)  = 1,  7 h (—1)  = 1 and  7 h (—3)  = 1,  identical  to  the  algebraic  multiplicities. 
This  example  is  of  interest  for  two  reasons.  First,  A = 0 is  an  eigenvalue,  illustrating 
the  upcoming  Theorem  SMZE.  Second,  all  the  eigenvalues  are  distinct,  yielding 
algebraic  and  geometric  multiplicities  of  1 for  each  eigenvalue,  illustrating  Theorem 
DED.  A 


Reading  Questions 

1.  Suppose  A is  the  2x2  matrix 

A - 0 81 

A - [-4  7 

Find  the  eigenvalues  of  A. 

2.  For  each  eigenvalue  of  A , find  the  corresponding  eigenspace. 

3.  For  the  polynomial  p(x)  = 3x2  — x + 2 and  A from  above,  compute  p(A). 
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Exercises 


CIO'  Find  the  characteristic  polynomial  of  the  matrix  A 
ClE  Find  the  characteristic  polynomial  of  the  matrix  A 


C12'  Find  the  characteristic  polynomial  of  the  matrix  A 


1 2 
3 4 • 

'3  2 l" 

Oil. 

1 2 0 

’12  10' 
10  10 
2 110 
3 10  1 


CltL  Find  the  eigenvalues,  eigenspaces,  algebraic  multiplicities  and  geometric  multiplici- 
ties for  the  matrix  below.  It  is  possible  to  do  all  these  computations  by  hand,  and  it  would 
be  instructive  to  do  so. 


C = 


-1 

-6 


2 

6 


C20'  Find  the  eigenvalues,  eigenspaces,  algebraic  multiplicities  and  geometric  multiplici- 
ties for  the  matrix  below.  It  is  possible  to  do  all  these  computations  by  hand,  and  it  would 
be  instructive  to  do  so. 


B = 


-12 

-5 


30 

13 


C2C  The  matrix  A below  has  A = 2 as  an  eigenvalue.  Find  the  geometric  multiplicity  of 
A = 2 using  your  calculator  only  for  row-reducing  matrices. 


'18 

-15 

33 

-15- 

-4 

8 

-6 

6 

-9 

9 

-16 

9 

5 

-6 

9 

-4 

C22'  Without  using  a calculator,  find  the  eigenvalues  of  the  matrix  B. 


C23^  Find  the  eigenvalues,  eigenspaces,  algebraic  and  geometric  multiplicities  for  A = 
1 1 
1 1 • 

C24^  Find  the  eigenvalues,  eigenspaces,  algebraic  and  geometric  multiplicities  for  A = 
' 1 -1  1 1 

-1  1 -1  . 

1-11 

C25^  Find  the  eigenvalues,  eigenspaces,  algebraic  and  geometric  multiplicities  for  the 
3x3  identity  matrix  I3.  Do  your  results  make  sense? 
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C26t  For  matrix  A = 


2 1 1 
1 2 1 
1 1 2 


, the  characteristic  polynomial  of  A is  pa  ( x ) = (4  — 


®)(1  — x)2 . Find  the  eigenvalues  and  corresponding  eigenspaces  of  A. 


C271"  For  matrix  A = 


' 0 4-1  1 ' 

-26-1  1 
-2  8 -1  -1 
-28-3  1 


, the  characteristic  polynomial  of  A is  pa  (*)  = 


( x 4-  2)(x  — 2)2(x  — 4).  Find  the  eigenvalues  and  corresponding  eigenspaces  of  A. 


M60f 


Repeat  Example  CAEHW  by  choosing  x = 


'O' 

8 

2 

1 

2 


and  then  arrive  at  an  eigenvalue 


and  eigenvector  of  the  matrix  A.  The  hard  way. 

T1(C  A matrix  A is  idempotent  if  A2  = A.  Show  that  the  only  possible  eigenvalues  of 
an  idempotent  matrix  are  A = 0 and  A = 1.  Then  give  an  example  of  a matrix  that  is 
idempotent  and  has  both  of  these  two  values  as  eigenvalues. 

TIS^  The  characteristic  polynomial  of  the  square  matrix  A is  usually  defined  as  xa{x)  = 
det  (xln  — A).  Find  a specific  relationship  between  our  characteristic  polynomial,  pa  (x), 
and  ta{x),  give  a proof  of  your  relationship,  and  use  this  to  explain  why  Theorem  EMRCP 
can  remain  essentially  unchanged  with  either  definition.  Explain  the  advantages  of  each 
definition  over  the  other.  (Computing  with  both  definitions,  for  a 2 x 2 and  a 3 x 3 matrix, 
might  be  a good  way  to  start.) 

T2(C  Suppose  that  A and  p are  two  different  eigenvalues  of  the  square  matrix  A.  Prove 
that  the  intersection  of  the  eigenspaces  for  these  two  eigenvalues  is  trivial.  That  is,  £a  (A)  Pi 
£a(p)  = { 0}. 


Section  PEE 

Properties  of  Eigenvalues  and  Eigenvectors 


The  previous  section  introduced  eigenvalues  and  eigenvectors,  and  concentrated  on 
their  existence  and  determination.  This  section  will  be  more  about  theorems,  and 
the  various  properties  eigenvalues  and  eigenvectors  enjoy.  Like  a good  4 x 100  meter 
relay,  we  will  lead-off  with  one  of  our  better  theorems  and  save  the  very  best  for  the 
anchor  leg. 

Subsection  BPE 

Basic  Properties  of  Eigenvalues 

Theorem  EDELI  Eigenvectors  with  Distinct  Eigenvalues  are  Linearly  Independent 
Suppose  that  A is  an  n x n square  matrix  and  S = {xi,  X2,  X3,  . . . , xp}  is  a set  of 
eigenvectors  with  eigenvalues  Ai,  A2,  A3,  . . . , Xp  such  that  Xi  7^  A j whenever  i 7^  j . 
Then  S is  a linearly  independent  set. 

Proof.  If  p = 1,  then  the  set  S = {xt}  is  linearly  independent  since  eigenvectors  are 
nonzero  (Definition  EEM),  so  assume  for  the  remainder  that  p > 2. 

We  will  prove  this  result  by  contradiction  (Proof  Technique  CD).  Suppose  to  the 
contrary  that  S'  is  a linearly  dependent  set.  Define  Si  = {xi,  X2,  X3,  . . . , x*}  and 
let  k be  an  integer  such  that  Sk-i  = {xi,  X2,  X3,  . . . , x^-i}  is  linearly  independent 
and  Sk  = {x!,  x2,  x3,  . . . , x^}  is  linearly  dependent.  We  have  to  ask  if  there  is 
even  such  an  integer  kl  First,  since  eigenvectors  are  nonzero,  the  set  {xi}  is  linearly 
independent.  Since  we  are  assuming  that  S = Sp  is  linearly  dependent,  there  must 
be  an  integer  k,  2 < k < p,  where  the  sets  Si  transition  from  linear  independence  to 
linear  dependence  (and  stay  that  way).  In  other  words,  xk  is  the  vector  with  the 
smallest  index  that  is  a linear  combination  of  just  vectors  with  smaller  indices. 

Since  {xi,  X2,  X3,  . . . , x*,}  is  a linearly  dependent  set  there  must  be  scalars, 
ai,  02,  a3,  . . . , ak,  not  all  zero  (Definition  LI),  so  that 


0 = aiXi  + a2x2  + a3x3  + • • • + akxk 


Then, 

0 = (A  - A kIn)  0 

= (A  - Xkln)  (aixi  + a2x2  + a3x3  H (-  akxk) 

= (A  — Xkln)  CL  1X1  + • • • + {A  — Xkln)  OfcXfc 
= Oi  (A  — Xkln)  Xi  + • • • + ak  (A  — XkIn)  Xfc 

= Oi  (Ax1  - A felnXi)  H hat  (^xfc  - A kInxk) 

= Oi  (Ax  1 - AfcXi)  H 1 -ak  (Axk  - Xkxk) 

= ai  (A1X1  - AfcXi)  -| ha*,  (Afcxfc  - Xkxk) 


Theorem  ZVSM 
Definition  RLD 


Theorem  MMDAA 
Theorem  MMSMM 
Theorem  MMDAA 


Theorem  MMIM 
Definition  EEM 
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= a\  (Ai  — A k)  xi  H + a,k  (Afc  — A *,)  X&  Theorem  MMDAA 

= a\  (Ai  — Afe)  xx  -) h afc_i  (Afc_i  - Afe)  xfc_3  + afc  (0)  xfe  Property  AICN 

= ai  (Ai  — Afe)  xi  H + afc-i  (Afc_i  — Afc)  Xfc_i  + 0 Theorem  ZSSM 

= ai  (Ai  - Afc)xi  -f hst-i  (Afc_i  - Afc)xfc_i  Property  Z 

This  equation  is  a relation  of  linear  dependence  on  the  linearly  independent  set 
{x1;  x2,  x3,  . . . , Xfc_!},  so  the  scalars  must  all  be  zero.  That  is,  cq  (A,  — A*,)  = 0 for 
1 < i < k — 1.  However,  we  have  the  hypothesis  that  the  eigenvalues  are  distinct,  so 
Xi  ^ Afe  for  1 < i < k — 1.  Thus  = 0 for  1 < i < k — 1. 

This  reduces  the  original  relation  of  linear  dependence  on  {xi,  x2,  x3,  . . . , x^,} 
to  the  simpler  equation  OfcXfc  = 0.  By  Theorem  SMEZV  we  conclude  that  a*,  = 0 or 
Xfc  = 0.  Eigenvectors  are  never  the  zero  vector  (Definition  EEM),  so  a k = 0.  So  all 
of  the  scalars  as*,  1 < i < k are  zero,  contradicting  their  introduction  as  the  scalars 
creating  a nontrivial  relation  of  linear  dependence  on  the  set  {xi,  x2,  x3,  . . . , x^.}. 
With  a contradiction  in  hand,  we  conclude  that  S must  be  linearly  independent.  ■ 

There  is  a simple  connection  between  the  eigenvalues  of  a matrix  and  whether  or 
not  the  matrix  is  nonsingular. 

Theorem  SMZE  Singular  Matrices  have  Zero  Eigenvalues 

Suppose  A is  a square  matrix.  Then  A is  singular  if  and  only  if  A = 0 is  an  eigenvalue 
of  A. 

Proof.  We  have  the  following  equivalences: 

there  exists  x^  0,  Ax  = 0 Definition  NM 

there  exists  x^  0,  Ax  = Ox  Theorem  ZSSM 

A = 0 is  an  eigenvalue  of  A Definition  EEM 


A is  singular 


With  an  equivalence  about  singular  matrices  we  can  update  our  list  of  equivalences 
about  nonsingular  matrices. 

Theorem  NME8  Nonsingular  Matrix  Equivalences,  Round  8 
Suppose  that  A is  a square  matrix  of  size  n.  The  following  are  equivalent. 

1.  A is  nonsingular. 

2.  A row-reduces  to  the  identity  matrix. 

3.  The  null  space  of  A contains  only  the  zero  vector,  Af(A)  = {0}. 

4-  The  linear  system  CS(A,  b)  has  a unique  solution  for  every  possible  choice  of 

b. 
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5.  The  columns  of  A are  a linearly  independent  set. 

6.  A is  invertible. 

1.  The  column  space  of  A is  Cra,  C(A)  = Cn. 

8.  The  columns  of  A are  a basis  for  C" . 

9.  The  rank  of  A is  n,  r (A)  = n. 

10.  The  nullity  of  A is  zero,  n(A)  = 0. 

11.  The  determinant  of  A is  nonzero,  det  (A)  ^ 0. 

12.  A = 0 is  not  an  eigenvalue  of  A. 

Proof.  The  equivalence  of  the  first  and  last  statements  is  Theorem  SMZE,  reformu- 
lated by  negating  each  statement  in  the  equivalence.  So  we  are  able  to  improve  on 
Theorem  NME7  with  this  addition.  ■ 

Certain  changes  to  a matrix  change  its  eigenvalues  in  a predictable  way. 

Theorem  ESMM  Eigenvalues  of  a Scalar  Multiple  of  a Matrix 

Suppose  A is  a square  matrix  and  X is  an  eigenvalue  of  A.  Then  aA  is  an  eigenvalue 

of  a A. 

Proof.  Let  x/Obe  one  eigenvector  of  A for  A.  Then 

(aA)  x = a (Ax)  Theorem  MMSMM 

= a (Ax)  Definition  EEM 

= (aA)  x Property  SMAC 

So  x ^ 0 is  an  eigenvector  of  aA  for  the  eigenvalue  aA.  ■ 

Unfortunately,  there  are  not  parallel  theorems  about  the  sum  or  product  of 
arbitrary  matrices.  But  we  can  prove  a similar  result  for  powers  of  a matrix. 

Theorem  EOMP  Eigenvalues  Of  Matrix  Powers 

Suppose  A is  a square  matrix,  X is  an  eigenvalue  of  A,  and  s > 0 is  an  integer.  Then 
Xs  is  an  eigenvalue  of  As . 

Proof.  Let  x / 0 be  one  eigenvector  of  A for  A.  Suppose  A has  size  n.  Then  we 
proceed  by  induction  on  s (Proof  Technique  I).  First,  for  s = 0, 

Asx  = A°x 


= x 
= lx 


Theorem  MMIM 
Property  OC 
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= A°x 
= Xsx 


so  As  is  an  eigenvalue  of  As  in  this  special  case.  If  we  assume  the  theorem  is  true  for 
s,  then  we  find 


AsAx 

As  (Ax) 

Definition  EEM 

A (Asx) 

Theorem  MMSMM 

A (Asx) 

Induction  hypothesis 

(AAS)  x 

Property  SMAC 

As+1x 

So  x 7^  0 is  an  eigenvector  of  As+1  for  As+1,  and  induction  tells  us  the  theorem 
is  true  for  all  s > 0.  ■ 


While  we  cannot  prove  that  the  sum  of  two  arbitrary  matrices  behaves  in  any 
reasonable  way  with  regard  to  eigenvalues,  we  can  work  with  the  sum  of  dissimilar 
powers  of  the  same  matrix.  We  have  already  seen  two  connections  between  eigenvalues 
and  polynomials,  in  the  proof  of  Theorem  EMHE  and  the  characteristic  polynomial 
(Definition  CP).  Our  next  theorem  strengthens  this  connection. 

Theorem  EPM  Eigenvalues  of  the  Polynomial  of  a Matrix 

Suppose  A is  a square  matrix  and  A is  an  eigenvalue  of  A.  Let  q(x)  be  a polynomial 

in  the  variable  x.  Then  q{ A)  is  an  eigenvalue  of  the  matrix  q{A). 


Proof.  Let  x / 0 be  one  eigenvector  of  A for  A,  and  write  q(x)  = a0  + aiX  + a2x2  + 
■ ■ ■ + amxm.  Then 

q(A)x  = ( a0A°  + aiA1  + a2A2  b amAm)  x 

= (aoA°)x  + (aiA1)x  + (ci2A2)x  + • • • + (arnAm)x  Theorem  MMDAA 
= ao(A°x)  + ai(A1x)  + ci2(A2x)  + • • • + am(Amx)  Theorem  MMSMM 
= a0(A°x)  + ai(A1x)  + a2(A2x)  H b am(Amx) 


= (a0A°)x  + (aiA1)x  + (a2A2)x  H b (amAm)x 

= (ao^°  A (iiA1  + a2X2  + • • • + amAm)  x 
= ?(A)x 

So  x ^ 0 is  an  eigenvector  of  q{A)  for  the  eigenvalue  q( A). 


Theorem  EOMP 
Property  SMAC 
Property  DSAC 


Example  BDE  Building  desired  eigenvalues 
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In  Example  ESMS4  the  4x4  symmetric  matrix 


C = 


'1 

0 

1 

1 


0 i r 

1 i l 
1 1 o 
1 o 1 


is  shown  to  have  the  three  eigenvalues  A = 3,  1,  —1.  Suppose  we  wanted  a 4 x 4 
matrix  that  has  the  three  eigenvalues  A = 4,  0,  —2.  We  can  employ  Theorem  EPM 
by  finding  a polynomial  that  converts  3 to  4,  1 to  0,  and  —1  to  —2.  Such  a polynomial 
is  called  an  interpolating  polynomial,  and  in  this  example  we  can  use 

, 1 ■;  5 

r(x)  = —x  + x — - 
4 4 

We  will  not  discuss  how  to  concoct  this  polynomial,  but  a text  on  numerical 
analysis  should  provide  the  details.  For  now,  simply  verify  that  r(3)  = 4,  r(l)  = 0 
and  r(— 1)  = —2. 

Now  compute 

r{C)  = \c2  + C-\h 


'3 

2 

2 

2' 

Y 

0 

1 

r 

A 

0 

0 

O' 

T 

1 

3 

3' 

1 

2 

3 

2 

2 

+ 

0 

1 

1 

l 

5 

0 

1 

0 

0 

1 

1 

1 

3 

3 

4 

2 

2 

3 

2 

1 

1 

1 

0 

~~  4 

0 

0 

1 

0 

“ 2 

3 

3 

1 

1 

2 

2 

2 

3 

1 

1 

0 

l 

0 

0 

0 

1 

3 

3 

1 

1 

Theorem  EPM  tells  us  that  if  r{x)  transforms  the  eigenvalues  in  the  desired 
manner,  then  r[C)  will  have  the  desired  eigenvalues.  You  can  check  this  by  computing 
the  eigenvalues  of  r(C)  directly.  Furthermore,  notice  that  the  multiplicities  are  the 
same,  and  the  eigenspaces  of  C and  r(C)  are  identical.  A 


Inverses  and  transposes  also  behave  predictably  with  regard  to  their  eigenvalues. 


Theorem  EIM  Eigenvalues  of  the  Inverse  of  a Matrix 

Suppose  A is  a square  nonsingular  matrix  and  A is  an  eigenvalue  of  A.  Then  A-1  is 
an  eigenvalue  of  the  matrix  A-1. 


Proof.  Notice  that  since  A is  assumed  nonsingular,  A-1  exists  by  Theorem  NI,  but 
more  importantly,  A-1  = 1/A  does  not  involve  division  by  zero  since  Theorem  SMZE 
prohibits  this  possibility. 

Let  x / 0 be  one  eigenvector  of  A for  A.  Suppose  A has  size  n.  Then 
A-1x  = A-1  (lx)  Property  OC 

= A"1(i Ax)  Property  MICN 

A 

= ^A_1(Ax) 


Theorem  MMSMM 
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= ^A-1(Ax) 
= ^(A-1A)x 

= ^x 
1 


So  x ^ 0 is  an  eigenvector  of  A 1 


Definition  EEM 
Theorem  MMA 
Definition  MI 
Theorem  MMIM 

for  the  eigenvalue  ■ 


The  proofs  of  the  theorems  above  have  a similar  style  to  them.  They  all  begin 
by  grabbing  an  eigenvalue-eigenvector  pair  and  adjusting  it  in  some  way  to  reach 
the  desired  conclusion.  You  should  add  this  to  your  toolkit  as  a general  approach  to 
proving  theorems  about  eigenvalues. 

So  far  we  have  been  able  to  reserve  the  characteristic  polynomial  for  strictly 
computational  purposes.  However,  sometimes  a theorem  about  eigenvalues  can  be 
proved  easily  by  employing  the  characteristic  polynomial  (rather  than  using  an 
eigenvalue-eigenvector  pair).  The  next  theorem  is  an  example  of  this. 

Theorem  ETM  Eigenvalues  of  the  Transpose  of  a Matrix 

Suppose  A is  a square  matrix  and  A is  an  eigenvalue  of  A.  Then  A is  an  eigenvalue 
of  the  matrix  A 1 . 


Proof.  Suppose  A has  size  n.  Then 


= det  (A  — xln) 

Definition  CP 

= det  ((A  - xlnf^j 

Theorem  DT 

= det  (A*  - (a ;/„)*) 

Theorem  TMA 

= det  (A*  — a:/*) 

Theorem  TMSM 

= det  (A*  — xln) 

Definition  IM 

= Pa ‘ {x) 

Definition  CP 

So  A and  A*  have  the  same  characteristic  polynomial,  and  by  Theorem  EMRCP, 
their  eigenvalues  are  identical  and  have  equal  algebraic  multiplicities.  Notice  that 
what  we  have  proved  here  is  a bit  stronger  than  the  stated  conclusion  in  the  theorem. 


If  a matrix  has  only  real  entries,  then  the  computation  of  the  characteristic 
polynomial  (Definition  CP)  will  result  in  a polynomial  with  coefficients  that  are  real 
numbers.  Complex  numbers  could  result  as  roots  of  this  polynomial,  but  they  are 
roots  of  quadratic  factors  with  real  coefficients,  and  as  such,  come  in  conjugate  pairs. 
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The  next  theorem  proves  this,  and  a bit  more,  without  mentioning  the  characteristic 
polynomial. 

Theorem  ERMCP  Eigenvalues  of  Real  Matrices  come  in  Conjugate  Pairs 
Suppose  A is  a square  matrix  with  real  entries  and  x is  an  eigenvector  of  A for  the 
eigenvalue  A.  Then  x is  an  eigenvector  of  A for  the  eigenvalue  A. 


Proof. 


Ax  = Ax 
= Ax 
= Ax 
= Ax 


A has  real  entries 
Theorem  MMCC 
Definition  EEM 
Theorem  CRSM 


So  x is  an  eigenvector  of  A for  the  eigenvalue  A. 


This  phenomenon  is  amply  illustrated  in  Example  CEMS6,  where  the  four 
complex  eigenvalues  come  in  two  pairs,  and  the  two  basis  vectors  of  the  eigenspaces 
are  complex  conjugates  of  each  other.  Theorem  ERMCP  can  be  a time-saver  for 
computing  eigenvalues  and  eigenvectors  of  real  matrices  with  complex  eigenvalues, 
since  the  conjugate  eigenvalue  and  eigenspace  can  be  inferred  from  the  theorem 
rather  than  computed. 


Subsection  ME 
Multiplicities  of  Eigenvalues 

A polynomial  of  degree  n will  have  exactly  n roots.  From  this  fact  about  polynomial 
equations  we  can  say  more  about  the  algebraic  multiplicities  of  eigenvalues. 

Theorem  DCP  Degree  of  the  Characteristic  Polynomial 

Suppose  that  A is  a square  matrix  of  size  n.  Then  the  characteristic  polynomial  of 
A,  pa  ( x ),  has  degree  n. 

Proof.  We  will  prove  a more  general  result  by  induction  (Proof  Technique  I).  Then 
the  theorem  will  be  true  as  a special  case.  We  will  carefully  state  this  result  as  a 
proposition  indexed  by  to,  to  > 1. 

P(in)\  Suppose  that  A is  an  to  x to  matrix  whose  entries  are  complex  numbers 
or  linear  polynomials  in  the  variable  x of  the  form  c — x,  where  c is  a complex 
number.  Suppose  further  that  there  are  exactly  k entries  that  contain  x and  that  no 
row  or  column  contains  more  than  one  such  entry.  Then,  when  k = to,  det  (A)  is  a 
polynomial  in  x of  degree  to,  with  leading  coefficient  ±1,  and  when  k < to,  det  (A) 
is  a polynomial  in  x of  degree  k or  less. 

Base  Case:  Suppose  A is  a 1 x 1 matrix.  Then  its  determinant  is  equal  to  the 
lone  entry  (Definition  DM).  When  k = m = 1,  the  entry  is  of  the  form  c — x,  a 
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polynomial  in  x of  degree  m = 1 with  leading  coefficient  —1.  When  k < to,  then 
k = 0 and  the  entry  is  simply  a complex  number,  a polynomial  of  degree  0 < k.  So 
P(l)  is  true. 

Induction  Step:  Assume  P(m ) is  true,  and  that  A is  an  (to  + 1)  x (to  + 1)  matrix 
with  k entries  of  the  form  c — x.  There  are  two  cases  to  consider. 

Suppose  k = to  + 1.  Then  every  row  and  every  column  will  contain  an  entry  of 
the  form  c — x.  Suppose  that  for  the  first  row,  this  entry  is  in  column  t.  Compute 
the  determinant  of  A by  an  expansion  about  this  first  row  (Definition  DM).  The 
term  associated  with  entry  t of  this  row  will  be  of  the  form 

(c  — x)(— l)1+t  det  (A  (l|f)) 

The  submatrix  A (l|f)  is  an  to  x to  matrix  with  k = m terms  of  the  form  c — x, 
no  more  than  one  per  row  or  column.  By  the  induction  hypothesis,  det  ( A (l|t))  will 
be  a polynomial  in  x of  degree  to  with  coefficient  ±1.  So  this  entire  term  is  then  a 
polynomial  of  degree  to  + 1 with  leading  coefficient  ±1. 

The  remaining  terms  (which  constitute  the  sum  that  is  the  determinant  of  A)  are 
products  of  complex  numbers  from  the  first  row  with  cofactors  built  from  submatrices 
that  lack  the  first  row  of  A and  lack  some  column  of  A,  other  than  column  t.  As 
such,  these  submatrices  are  to  x to  matrices  with  k = m — 1 < to  entries  of  the 
form  c — x,  no  more  than  one  per  row  or  column.  Applying  the  induction  hypothesis, 
we  see  that  these  terms  are  polynomials  in  x of  degree  to  — 1 or  less.  Adding  the 
single  term  from  the  entry  in  column  t with  all  these  others,  we  see  that  det  (A)  is  a 
polynomial  in  x of  degree  to  + 1 and  leading  coefficient  ±1. 

The  second  case  occurs  when  k < m + 1.  Now  there  is  a row  of  A that  does  not 
contain  an  entry  of  the  form  c — x.  We  consider  the  determinant  of  A by  expanding 
about  this  row  (Theorem  DER),  whose  entries  are  all  complex  numbers.  The  cofactors 
employed  are  built  from  submatrices  that  are  to  x to  matrices  with  either  k or  k — 1 
entries  of  the  form  c—  x,  no  more  than  one  per  row  or  column.  In  either  case,  k < to, 
and  we  can  apply  the  induction  hypothesis  to  see  that  the  determinants  computed 
for  the  cofactors  are  all  polynomials  of  degree  k or  less.  Summing  these  contributions 
to  the  determinant  of  A yields  a polynomial  in  x of  degree  k or  less,  as  desired. 

Definition  CP  tells  us  that  the  characteristic  polynomial  of  an  n x n matrix  is 
the  determinant  of  a matrix  having  exactly  n entries  of  the  form  c — x,  no  more  than 
one  per  row  or  column.  As  such  we  can  apply  P(n)  to  see  that  the  characteristic 
polynomial  has  degree  n.  ■ 

Theorem  NEM  Number  of  Eigenvalues  of  a Matrix 

Suppose  that  Ai,  A2,  A3,  . . . , A&  are  the  distinct  eigenvalues  of  a square  matrix  A of 
size  n.  Then 

k 

^2  a A (Ai)  = n 

i—1 
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Proof.  By  the  definition  of  the  algebraic  multiplicity  (Definition  AME),  we  can  factor 
the  characteristic  polynomial  as 

pA  Or)  = c(x  - Ai)a^(Al)0r  - X2)aA(X2\x  - A3)“a(A3)  ■ ■ ■ (x  - Xk)aA{Xk) 

where  c is  a nonzero  constant.  (We  could  prove  that  c = (—1)",  but  we  do  not 
need  that  specificity  right  now.  See  Exercise  PEE.T30)  The  left-hand  side  is  a 
polynomial  of  degree  n by  Theorem  DCP  and  the  right-hand  side  is  a polynomial  of 
degree  aA  (A,).  So  the  equality  of  the  polynomials’  degrees  gives  the  equality 
E*=i  « a (A*)  =n.  m 

Theorem  ME  Multiplicities  of  an  Eigenvalue 

Suppose  that  A is  a square  matrix  of  size  n and  A is  an  eigenvalue.  Then 

1 < 7 a (A)  < a A (A)  < n 

Proof.  Since  A is  an  eigenvalue  of  A , there  is  an  eigenvector  of  A for  A,  x.  Then 
x £ £a  (A),  so  'ja  (A)  > 1,  since  we  can  extend  {x}  into  a basis  of  Ea  (A)  (Theorem 
ELIS). 

To  show  7^4  (A)  < a a (A)  is  the  most  involved  portion  of  this  proof.  To  this  end, 
let  g = "/a  (A)  and  let  xi,  x2,  x3,  . . . , xs  be  a basis  for  the  eigenspace  of  A,  Ea  (A). 
Construct  another  n — g vectors,  y3,  y2,  y3,  . . . , y n-g,  so  that 

{xi,  x2 , x3 , . . . , x9,  yi,  y2,  y3,  . . . , yn-g} 

is  a basis  of  Cn.  This  can  be  done  by  repeated  applications  of  Theorem  ELIS. 
Finally,  define  a matrix  S by 

S = [xi|x2|x3|  . . . |xg|yi|y2|y3|  . . . |y„_g]  = [xi|x2|x3|  . . . |xs|i?] 

where  R is  an  n x (n  — g)  matrix  whose  columns  are  yi,  y2,  y3,  . . . , y n-g-  The 
columns  of  S are  linearly  independent  by  design,  so  S is  nonsingular  (Theorem 
NMLIC)  and  therefore  invertible  (Theorem  NI). 

Then, 

[ei|e2|e3| . . . |e„]  = In 

= s-'s 

= 5'_1[xi|x2|x3|  . . . |xg|A] 

= [S,_1xi|S,_1x2|S'_1x3| . . . |S,_1Xg|S'_1I?] 

So 

S~1xi  = e.i  1 < i < g (*) 

Preparations  in  place,  we  compute  the  characteristic  polynomial  of  A , 

Pa  (x)  = det  ( A — xln)  Definition  CP 

= 1 det  (A  — xln)  Property  OCN 


§PEE 


Beezer:  A First  Course  in  Linear  Algebra 


399 


= det  (/„)  det  ( A — xln) 

Definition  DM 

= det  (S'-1.!?)  det  {A  — xln ) 

Definition  MI 

= det  (S_1)  det  (S)  det  {A  — xln) 

Theorem  DRMM 

= det  (S-1)  det  ( A — xln ) det  (S) 

Property  CMCN 

= det  (S"1  (A  - xln)  S) 

Theorem  DRMM 

= det  (S-1AS  — S~1xInS) 

Theorem  MMDAA 

= det  (S-1AS  — xS~1InS) 

Theorem  MMSMM 

= det  (S~1AS  — xS~1S) 

Theorem  MMIM 

= det  (S~1AS  — xln) 

Definition  MI 

= Ps-'as (x) 

Definition  CP 

What  can  we  learn  then  about  the  matrix  S 1AS? 


S-1AS  = S'_1A[xi|x2|x3|  . . . |xg|A] 

= S~ 1 [Axx | Ax2 | Ax3 | . . . \Axg\AR] 

= S-1[Axi|Ax2|Ax3|  . . . |Axg|AR] 

= [S~  lAx1 1 5_1Ax2  | 5_1Ax3  | . . . |S'-1Axg|S,-1Ai?] 
= [XS~ lxi | AS" lx2 1 AS" 1x3 1 . . . |AS"1Xg|S,-1Ai?] 
= [Ae1|Ae2|Ae3|...|Aeg|S"1AR] 


Definition  MM 
Definition  EEM 
Definition  MM 
Theorem  MMSMM 
S~1S  = In , ((*)  above) 


Now  imagine  computing  the  characteristic  polynomial  of  A by  computing  the 
characteristic  polynomial  of  S-1AS  using  the  form  just  obtained.  The  first  g columns 
of  S-1AS  are  all  zero,  save  for  a A on  the  diagonal.  So  if  we  compute  the  determinant 
by  expanding  about  the  first  column,  successively,  we  will  get  successive  factors  of 
(A  — x).  More  precisely,  let  T be  the  square  matrix  of  size  n — g that  is  formed  from 
the  last  n — g rows  and  last  n — g columns  of  S~1AR.  Then 


pA  (x)  = Ps-'as  0)  = (A  - x)apT  (x)  . 

This  says  that  ( x — A)  is  a factor  of  the  characteristic  polynomial  at  least  g times, 
so  the  algebraic  multiplicity  of  A as  an  eigenvalue  of  A is  greater  than  or  equal  to  g 
(Definition  AME).  In  other  words, 

7 a (A)  = g < aA  (A) 


as  desired. 

Theorem  NEM  says  that  the  sum  of  the  algebraic  multiplicities  for  all  the 
eigenvalues  of  A is  equal  to  n.  Since  the  algebraic  multiplicity  is  a positive  quantity, 
no  single  algebraic  multiplicity  can  exceed  n without  the  sum  of  all  of  the  algebraic 
multiplicities  doing  the  same.  ■ 


Theorem  MNEM  Maximum  Number  of  Eigenvalues  of  a Matrix 
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Suppose  that  A is  a square  matrix  of  size  n.  Then  A cannot  have  more  than  n 
distinct  eigenvalues. 

Proof.  Suppose  that  A has  k distinct  eigenvalues,  Ai,  A2,  A3,  . . . , A&.  Then 
k 

i= 1 
k 

< a a (A  i)  Theorem  ME 

i= 1 

= n Theorem  NEM 


Subsection  EHM 

Eigenvalues  of  Hermitian  Matrices 

Recall  that  a matrix  is  Hermitian  (or  self-adjoint)  if  A = A*  (Definition  HM).  In 
the  case  where  A is  a matrix  whose  entries  are  all  real  numbers,  being  Hermitian 
is  identical  to  being  symmetric  (Definition  SYM).  Keep  this  in  mind  as  you  read 
the  next  two  theorems.  Their  hypotheses  could  be  changed  to  “suppose  A is  a real 
symmetric  matrix.” 

Theorem  HMRE  Hermitian  Matrices  have  Real  Eigenvalues 

Suppose  that  A is  a Hermitian  matrix  and  A is  an  eigenvalue  of  A.  Then  A £ R. 


Proof.  Let  x / 0 be  one  eigenvector  of  A for  the  eigenvalue  A.  Then  by  Theorem 
PIP  we  know  (x,  x)  7^  0.  So 


A = 


1 


(x,  x; 

1 

(x,  x) 

1 

(x,  x) 

1 

(x,  x) 

1 

(x,  x) 

1 

(x,  x) 


A (x,  x) 

Property  MICN 

(x,  Ax) 

Theorem  IPSM 

(x,  Ax) 

Definition  EEM 

(Ax,  x) 

Theorem  HMIP 

(Ax,  x) 

Definition  EEM 

A (x,  x) 

Theorem  IPSM 

= A 


Property  MICN 
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If  a complex  number  is  equal  to  its  conjugate,  then  it  has  a complex  part  equal 
to  zero,  and  therefore  is  a real  number.  ■ 


Notice  the  appealing  symmetry  to  the  justifications  given  for  the  steps  of  this 
proof.  In  the  center  is  the  ability  to  pitch  a Hermitian  matrix  from  one  side  of  the 
inner  product  to  the  other. 

Look  back  and  compare  Example  ESMS4  and  Example  CEMS6.  In  Example 
CEMS6  the  matrix  has  only  real  entries,  yet  the  characteristic  polynomial  has  roots 
that  are  complex  numbers,  and  so  the  matrix  has  complex  eigenvalues.  However,  in 
Example  ESMS4,  the  matrix  has  only  real  entries,  but  is  also  symmetric,  and  hence 
Hermitian.  So  by  Theorem  HMRE,  we  were  guaranteed  eigenvalues  that  are  real 
numbers. 

In  many  physical  problems,  a matrix  of  interest  will  be  real  and  symmetric,  or 
Hermitian.  Then  if  the  eigenvalues  are  to  represent  physical  quantities  of  interest, 
Theorem  HMRE  guarantees  that  these  values  will  not  be  complex  numbers. 

The  eigenvectors  of  a Hermitian  matrix  also  enjoy  a pleasing  property  that  we 
will  exploit  later. 

Theorem  HMOE  Hermitian  Matrices  have  Orthogonal  Eigenvectors 
Suppose  that  A is  a Hermitian  matrix  and  x and  y are  two  eigenvectors  of  A for 
different  eigenvalues.  Then  x and  y are  orthogonal  vectors. 


(x,  y)  = 


Proof.  Let  x be  an  eigenvector  of  A for  A and  let  y be  an  eigenvector  of  A for  a 
different  eigenvalue  p.  So  we  have  A — p 0.  Then 

1 

A - p 

1 

A - p 

1 

A - p 

1 

A - p 

1 

A - p 

1 

A - p 

1 

A - p 
= 0 


(0) 


p)  (x,  y) 

Property  MICN 

■,  y)  - p(x,  y)) 

Property  DCN 

, y)  - (x,  py)) 

Theorem  IPSM 

, y)  - (x,  py)) 

Theorem  HMRE 

, y)  - (x,  Ay)) 

Definition  EEM 

, y)  - (Ax,  y)) 

Theorem  HMIP 

Property  AICN 

This  equality  says  that  x and  y are  orthogonal  vectors  (Definition  OV). 
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Notice  again  how  the  key  step  in  this  proof  is  the  fundamental  property  of 
a Hermitian  matrix  (Theorem  HMIP)  — the  ability  to  swap  A across  the  two 
arguments  of  the  inner  product.  We  will  build  on  these  results  and  continue  to  see 
some  more  interesting  properties  in  Section  OD. 

Reading  Questions 

1.  How  can  you  identify  a nonsingular  matrix  just  by  looking  at  its  eigenvalues? 

2.  How  many  different  eigenvalues  may  a square  matrix  of  size  n have? 

3.  What  is  amazing  about  the  eigenvalues  of  a Hermitian  matrix  and  why  is  it  amazing? 

Exercises 

T1(P  Suppose  that  A is  a square  matrix.  Prove  that  the  constant  term  of  the  characteristic 
polynomial  of  A is  equal  to  the  determinant  of  A. 

T2(P  Suppose  that  A is  a square  matrix.  Prove  that  a single  vector  may  not  be  an 
eigenvector  of  A for  two  different  eigenvalues. 

T22  Suppose  that  U is  a unitary  matrix  with  eigenvalue  A.  Prove  that  A has  modulus  1, 
i.e.  |A|  = 1.  This  says  that  all  of  the  eigenvalues  of  a unitary  matrix  lie  on  the  unit  circle  of 
the  complex  plane. 

T30  Theorem  DCP  tells  us  that  the  characteristic  polynomial  of  a square  matrix  of  size  n 
has  degree  n.  By  suitably  augmenting  the  proof  of  Theorem  DCP  prove  that  the  coefficient 
of  xn  in  the  characteristic  polynomial  is  (— l)n. 

T5(P  Theorem  EIM  says  that  if  A is  an  eigenvalue  of  the  nonsingular  matrix  A,  then  4 
is  an  eigenvalue  of  A-1.  Write  an  alternate  proof  of  this  theorem  using  the  characteristic 
polynomial  and  without  making  reference  to  an  eigenvector  of  A for  A. 


Section  SD 

Similarity  and  Diagonalization 

This  section’s  topic  will  perhaps  seem  out  of  place  at  first,  but  we  will  make  the 
connection  soon  with  eigenvalues  and  eigenvectors.  This  is  also  our  first  look  at  one 
of  the  central  ideas  of  Chapter  R. 

Subsection  SM 
Similar  Matrices 

The  notion  of  matrices  being  “similar”  is  a lot  like  saying  two  matrices  are  row- 
equivalent.  Two  similar  matrices  are  not  equal,  but  they  share  many  important 
properties.  This  section,  and  later  sections  in  Chapter  R will  be  devoted  in  part  to 
discovering  just  what  these  common  properties  are. 

First,  the  main  definition  for  this  section. 

Definition  SIM  Similar  Matrices 

Suppose  A and  B are  two  square  matrices  of  size  n.  Then  A and  B are  similar  if 
there  exists  a nonsingular  matrix  of  size  n,  S,  such  that  A = S~1BS.  □ 

We  will  say  “A  is  similar  to  B via  S”  when  we  want  to  emphasize  the  role  of  S 
in  the  relationship  between  A and  B.  Also,  it  does  not  matter  if  we  say  A is  similar 
to  B , or  B is  similar  to  A.  If  one  statement  is  true  then  so  is  the  other,  as  can  be 
seen  by  using  S”1  in  place  of  S (see  Theorem  SER  for  the  careful  proof).  Finally,  we 
will  refer  to  S~1BS  as  a similarity  transformation  when  we  want  to  emphasize 
the  way  S changes  B.  OK,  enough  about  language,  let  us  build  a few  examples. 

Example  SMS5  Similar  matrices  of  size  5 

If  you  wondered  if  there  are  examples  of  similar  matrices,  then  it  will  not  be  hard  to 
convince  you  they  exist.  Define 


-4 

1 

—3 

-2 

2 ' 

' 1 

2 

-1 

1 

1 " 

1 

2 

-1 

3 

-2 

0 

1 

-1 

-2 

-1 

B = 

-4 

1 

3 

2 

2 

S = 

1 

3 

-1 

1 

1 

-3 

4 

—2 

-1 

-3 

—2 

-3 

3 

1 

-2 

3 

1 

-1 

1 

-4 

1 

3 

-1 

2 

1 

Check  that  S is  nonsingular  and  then  compute 
A = S~1BS 


'10 

1 

0 

2 

-5' 

'-4 

1 

-3 

-2 

2 ' 

' 1 

2 

-1 

1 

1 ' 

-1 

0 

1 

0 

0 

1 

2 

-1 

3 

-2 

0 

1 

-1 

-2 

-1 

3 

0 

2 

1 

-3 

—4 

1 

3 

2 

2 

1 

3 

-1 

1 

1 

0 

0 

-1 

0 

1 

- 3 

4 

-2 

-1 

-3 

-2 

-3 

3 

1 

-2 

-4 

-1 

1 

-1 

1 

3 

1 

-1 

1 

-4 

1 

3 

-1 

2 

1 
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'-10 

-27 

-29 

-80 

-25' 

-2 

6 

6 

10 

-2 

-3 

11 

-9 

-14 

-9 

-1 

-13 

0 

-10 

-1 

11 

35 

6 

49 

19 

So  by  this  construction,  we  know  that  A and  B are  similar.  A 

Let  us  do  that  again. 


Example  SMS3  Similar  matrices  of  size  3 
Define 


'—13 

-8 

-4' 

- 1 

1 

2 ' 

12 

7 

4 

S = 

-2 

-1 

-3 

24 

16 

7 

1 

-2 

0 

Check  that  S is  nonsingular  and  then  compute 

A = S~'1BS 

-6  -4  -1 
-3  -2  -1 
.5  3 1 

-10  0 
0 3 0 

0 0-1 


'—13  -8  -4' 

' 1 1 2 ' 

12  7 4 

-2  -1  -3 

24  16  7 

1-2  0 

So  by  this  construction,  we  know  that  A and  B are  similar.  But  before  we 
move  on,  look  at  how  pleasing  the  form  of  A is.  Not  convinced?  Then  consider 
that  several  computations  related  to  A are  especially  easy.  For  example,  in  the 
spirit  of  Example  DUTM,  det  (A)  = (— 1)(3)(— 1)  = 3.  Similarly,  the  characteristic 
polynomial  is  straightforward  to  compute  by  hand,  pa  ( x ) = (— 1 — a:)  (3  — x)(— 1 — 
x)  = — (x  — 3)(x  + l)2  and  since  the  result  is  already  factored,  the  eigenvalues  are 
transparently  A = 3,  —1.  Finally,  the  eigenvectors  of  A are  just  the  standard  unit 
vectors  (Definition  SUV).  A 


Subsection  PSM 
Properties  of  Similar  Matrices 

Similar  matrices  share  many  properties  and  it  is  these  theorems  that  justify  the 
choice  of  the  word  “similar.”  First  we  will  show  that  similarity  is  an  equivalence 
relation.  Equivalence  relations  are  important  in  the  study  of  various  algebras  and 
can  always  be  regarded  as  a kind  of  weak  version  of  equality.  Sort  of  alike,  but  not 
quite  equal.  The  notion  of  two  matrices  being  row-equivalent  is  an  example  of  an 
equivalence  relation  we  have  been  working  with  since  the  beginning  of  the  course  (see 
Exercise  RREF.T11).  Row-equivalent  matrices  are  not  equal,  but  they  are  a lot  alike. 
For  example,  row-equivalent  matrices  have  the  same  rank.  Formally,  an  equivalence 
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relation  requires  three  conditions  hold:  reflexive,  symmetric  and  transitive.  We  will 
illustrate  these  as  we  prove  that  similarity  is  an  equivalence  relation. 

Theorem  SER  Similarity  is  an  Equivalence  Relation 
Suppose  A,  B and  C are  square  matrices  of  size  n.  Then 


1.  A is  similar  to  A.  (Reflexive) 

2.  If  A is  similar  to  B,  then  B is  similar  to  A.  (Symmetric) 

3.  If  A is  similar  to  B and  B is  similar  to  C,  then  A is  similar  to  C . (Transitive) 


Proof.  To  see  that  A is  similar  to  A,  we  need  only  demonstrate  a nonsingular 
matrix  that  effects  a similarity  transformation  of  A to  A.  In  is  nonsingular  (since  it 
row-reduces  to  the  identity  matrix,  Theorem  NMRRI),  and 

I~xAIn  = InAIn  = A 


If  we  assume  that  A is  similar  to  B,  then  we  know  there  is  a nonsingular  matrix 
S so  that  A = S~1BS  by  Definition  SIM.  By  Theorem  MIMI,  S'-1  is  invertible,  and 
by  Theorem  NI  is  therefore  nonsingular.  So 

(S'_1)_1A(S,_1)  = SAS-1  Theorem  MIMI 


= SS~ 1 BSS~ 1 Definition  SIM 

= (SS-1)  B ( SS -1)  Theorem  MMA 

= InBIn  Definition  MI 


= B 


Theorem  MMIM 


and  we  see  that  B is  similar  to  A. 

Assume  that  A is  similar  to  B1  and  B is  similar  to  C.  This  gives  us  the  existence 
of  two  nonsingular  matrices,  S and  R , such  that  A = S~lBS  and  B = R~1CR , 
by  Definition  SIM.  (Notice  how  we  have  to  assume  S ^ R:  as  will  usually  be  the 
case.)  Since  S and  R are  invertible,  so  too  RS  is  invertible  by  Theorem  SS  and  then 
nonsingular  by  Theorem  NI.  Now 

(RS)~1C(RS)  = S~1R~1CRS  Theorem  SS 

= S _1  (R-'CR)  S Theorem  MMA 

= S~1BS  Definition  SIM 

= A 


so  A is  similar  to  C via  the  nonsingular  matrix  RS. 


Here  is  another  theorem  that  tells  us  exactly  what  sorts  of  properties  similar 
matrices  share. 
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Theorem  SMEE  Similar  Matrices  have  Equal  Eigenvalues 

Suppose  A and  B are  similar  matrices.  Then  the  characteristic  polynomials  of  A 
and  B are  equal,  that  is,  pa  (x)  = Pb  (x). 

Proof.  Let  n denote  the  size  of  A and  B.  Since  A and  B are  similar,  there  exists  a 
nonsingular  matrix  S,  such  that  A = S~1BS  (Definition  SIM).  Then 


Pa  (x)  = det  {A  - xln) 

Definition  CP 

1 

1 

0 

II 

Definition  SIM 

= det  (S~1BS  — xS~1InS) 

Theorem  MMIM 

= det  (S~XBS  — S~1xInS) 

Theorem  MMSMM 

= det  (S-1  {B  - xln)  S) 

Theorem  MMDAA 

= det  (S'-1)  det  (B  — xln)  det  (S) 

Theorem  DRMM 

= det  (S-1)  det  (S)  det  ( B — xln) 

Property  CMCN 

= det  (S-1S)  det  (B  — xln) 

Theorem  DRMM 

= det  (/„)  det  ( B — xln) 

Definition  MI 

= 1 det  ( B — xln ) 

Definition  DM 

= Pb  (x) 

Definition  CP 

So  similar  matrices  not  only  have  the  same  set  of  eigenvalues,  the  algebraic 
multiplicities  of  these  eigenvalues  will  also  be  the  same.  However,  be  careful  with 
this  theorem.  It  is  tempting  to  think  the  converse  is  true,  and  argue  that  if  two 
matrices  have  the  same  eigenvalues,  then  they  are  similar.  Not  so,  as  the  following 
example  illustrates. 

Example  EENS  Equal  eigenvalues,  not  similar 
Define 


and  check  that 

Pa  (x)  = pb  (x)  = 1 - 2x  + x2  = (x  - l)2 

and  so  A and  B have  equal  characteristic  polynomials.  If  the  converse  of  Theorem 
SMEE  were  true,  then  A and  B would  be  similar.  Suppose  this  is  the  case.  More 
precisely,  suppose  there  is  a nonsingular  matrix  S so  that  A = S^^^BS. 

Then 


A = 


1 1 
0 1 


A = S~1BS  = S-1/2S  = S-1S  = I2 
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Clearly  A ^ 
SMEE  is  false. 


I2  and  this  contradiction  tells  us  that  the  converse  of  Theorem 

A 


Subsection  D 
Diagonalization 

Good  things  happen  when  a matrix  is  similar  to  a diagonal  matrix.  For  example,  the 
eigenvalues  of  the  matrix  are  the  entries  on  the  diagonal  of  the  diagonal  matrix.  And 
it  can  be  a much  simpler  matter  to  compute  high  powers  of  the  matrix.  Diagonalizable 
matrices  are  also  of  interest  in  more  abstract  settings.  Here  are  the  relevant  definitions, 
then  our  main  theorem  for  this  section. 


Definition  DIM  Diagonal  Matrix 

Suppose  that  A is  a square  matrix.  Then  A is  a diagonal  matrix  if  [A]^-  = 0 
whenever  i 7^  j.  □ 

Definition  DZM  Diagonalizable  Matrix 

Suppose  A is  a square  matrix.  Then  A is  diagonalizable  if  A is  similar  to  a diagonal 
matrix.  □ 


Example  DAB  Diagonalization  of  Archetype  B 
Archetype  B has  a 3 x 3 coefficient  matrix 


B = 


'-7 

5 

1 


-6 

5 

0 


-12" 

7 

4 


and  is  similar  to  a diagonal  matrix,  as  can  be  seen  by  the  following  computation 
with  the  nonsingular  matrix  S , 


S~1BS  = 


-5 

-3 

-2‘ 

1 

--7 

-6 

-12 

--5 

-3 

— 

3 

2 

1 

5 

5 

7 

3 

2 

1 

1 

1 

1 . 

. 1 

0 

4 

. 1 

1 

1 

-1 

-1 

-r 

7 

-6 

-12- 

— 

5 

-3 

-2- 

2 

3 

1 

5 

5 

7 

3 

2 

1 

-1 

-2 

1 . 

. 1 

0 

4 

1 

1 

1 

-I  0 CT 
0 1 0 
0 0 2 


A 


Example  SMS3  provides  yet  another  example  of  a matrix  that  is  subjected  to 
a similarity  transformation  and  the  result  is  a diagonal  matrix.  Alright,  just  how 
would  we  find  the  magic  matrix  S that  can  be  used  in  a similarity  transformation 
to  produce  a diagonal  matrix?  Before  you  read  the  statement  of  the  next  theorem, 
you  might  study  the  eigenvalues  and  eigenvectors  of  Archetype  B and  compute  the 
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eigenvalues  and  eigenvectors  of  the  matrix  in  Example  SMS3. 

Theorem  DC  Diagonalization  Characterization 

Suppose  A is  a square  matrix  of  size  n.  Then  A is  diagonalizable  if  and  only  if  there 
exists  a linearly  independent  set  S that  contains  n eigenvectors  of  A. 


Proof.  (4=)  Let  S = {xi,  x2,  x3,  . . . , x„}  be  a linearly  independent  set  of  eigenvec- 
tors of  A for  the  eigenvalues  Ai,  A2,  A3,  . . . , \n.  Recall  Definition  SUV  and  define 


R = [xi|x2|x3|  . . . |x„] 


rAi 

0 


0 

a2 

0 


0 

0 

a3 


O' 

0 

0 


[Aie1|A2e2|A3e3| . . . |A„e„] 


0 0 0 • • • An 


The  columns  of  R are  the  vectors  of  the  linearly  independent  set  S and  so  by 
Theorem  NMLIC  the  matrix  R is  nonsingular.  By  Theorem  NI  we  know  R~l  exists. 


R~1AR  = R~XA  [x!|x2|x3| . . . |x„] 

= 1 [Axi  | Ax2  I Ax3  I . . . |Axn] 

= i?_1[AiXi|A2x2|A3x3| . . . |Anx„] 

= R 1[Aii?ei|A2i?e2|A3i?e3| . . . | Ani?,en] 

= i?-1[i?(Aie1)|i?(A2e2)|i?(A3e3)| . . . |i?(Anen)] 
= R 1R[Aie1|A2e2|A3e3| . . . |Ane„] 

= InD 
= D 


Definition  MM 
Definition  EEM 
Definition  MVP 
Theorem  MMSMM 
Definition  MM 
Definition  MI 
Theorem  MMIM 


This  says  that  A is  similar  to  the  diagonal  matrix  D via  the  nonsingular  matrix 
R.  Thus  A is  diagonalizable  (Definition  DZM). 

(=>)  Suppose  that  A is  diagonalizable,  so  there  is  a nonsingular  matrix  of  size  n 

T = [yi|y2|y3|  • • • |y«] 


and  a diagonal  matrix  (recall  Definition  SUV) 


rdi 

0 


0 0 

d2  0 
0 d3 


O' 

0 

0 


[diex |d2e2|d3e3|  . . . |d„e„] 


0 0 0 • • • dn 


such  that  T lAT  = E. 
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Then  consider, 


[Ayi|Ay2|Ay3|  . . . |Ay„] 

= A[yi|y2|y3|...|yn] 

= AT 
= In  AT 
= TT"1  AT 
= TE 

= T[d;Lei|d2e2|d3e3|  . . . |dne„] 

= [T(die1)|T(d2e2)|T(d3e3)|  . . . \T{dnen)\ 
= [c?iTei|d2Te2|d3Te3| . . . |dnTe„] 

= [diyi|d2y2|d3y3|  • • ■ Ky„] 


Definition  MM 

Theorem  MMIM 
Definition  MI 


Definition  MM 
Definition  MM 
Definition  MVP 


This  equality  of  matrices  (Definition  ME)  allows  us  to  conclude  that  the  individual 
columns  are  equal  vectors  (Definition  CVE).  That  is,  Ay^  = c^y^  for  1 < i < n. 
In  other  words,  y i is  an  eigenvector  of  A for  the  eigenvalue  dj,  1 < i < n.  (Why 
does  y,:  ^ 0?).  Because  T is  nonsingular,  the  set  containing  T’s  columns,  S = 
{yi,  y2,  y3,  . . . , y„},  is  a linearly  independent  set  (Theorem  NMLIC).  So  the  set  S 
has  all  the  required  properties.  ■ 


Notice  that  the  proof  of  Theorem  DC  is  constructive.  To  diagonalize  a matrix, 
we  need  only  locate  n linearly  independent  eigenvectors.  Then  we  can  construct 
a nonsingular  matrix  using  the  eigenvectors  as  columns  ( R ) so  that  R~1AR  is  a 
diagonal  matrix  (D).  The  entries  on  the  diagonal  of  D will  be  the  eigenvalues  of  the 
eigenvectors  used  to  create  R , in  the  same  order  as  the  eigenvectors  appear  in  R. 
We  illustrate  this  by  diagonalizing  some  matrices. 


Example  DMS3  Diagonalizing  a matrix  of  size  3 
Consider  the  matrix 


r— 13 


F = 


12 

24 


-8 

7 

16 


-4" 

4 

7 


of  Example  CPMS3,  Example  EMS3  and  Example  ESMS3.  F' s eigenvalues  and 
eigenspaces  are 
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Define  the  matrix  S to  be  the  3x3  matrix  whose  columns  are  the  three  basis 
vectors  in  the  eigenspaces  for  F, 


r 1 

2 

11 

,2 

3 

3 

1 

2 

1 

0 

1 

0 

1 

S = 


Check  that  S is  nonsingular  (row-reduces  to  the  identity  matrix,  Theorem  NMRRI 
or  has  a nonzero  determinant,  Theorem  SMZD).  Then  the  three  columns  of  S are  a 
linearly  independent  set  (Theorem  NMLIC).  By  Theorem  DC  we  now  know  that  F 
is  diagonalizable.  Furthermore,  the  construction  in  the  proof  of  Theorem  DC  tells  us 
that  if  we  apply  the  matrix  S to  F in  a similarity  transformation,  the  result  will  be 
a diagonal  matrix  with  the  eigenvalues  of  F on  the  diagonal.  The  eigenvalues  appear 
on  the  diagonal  of  the  matrix  in  the  same  order  as  the  eigenvectors  appear  in  S.  So, 

_ i' 

3 

0 
1 


' 1 

2 

2 

3 

-1 

-13 

-8 

— 

4- 

" 1 
,2 

2 

3 

1FS  = 

1 

2 

1 

0 

12 

7 

4 

1 

2 

1 

1 

0 

1 

24 

16 

7 . 

1 

0 

' 6 

4 

2 ‘ 

r— 13 

-8 

-4- 

1 

2 

3 

1" 

3 

= 

-3 

-1 

-1 

12 

7 

4 

1 

2 

1 

0 

-6 

-4 

-1 

. 24 

16 

7 

1 

0 

1 

'3  0 0 ■ 

0-10 
0 0-1 


Note  that  the  above  computations  can  be  viewed  two  ways.  The  proof  of  Theorem 
DC  tells  us  that  the  four  matrices  (F,  S,  F_1  and  the  diagonal  matrix)  will  interact 
the  way  we  have  written  the  equation.  Or  as  an  example,  we  can  actually  perform 
the  computations  to  verify  what  the  theorem  predicts.  A 


The  dimension  of  an  eigenspace  can  be  no  larger  than  the  algebraic  multiplicity 
of  the  eigenvalue  by  Theorem  ME.  When  every  eigenvalue’s  eigenspace  is  this  large, 
then  we  can  diagonalize  the  matrix,  and  only  then.  Three  examples  we  have  seen  so 
far  in  this  section,  Example  SMS5,  Example  DAB  and  Example  DMS3,  illustrate 
the  diagonalization  of  a matrix,  with  varying  degrees  of  detail  about  just  how  the 
diagonalization  is  achieved.  However,  in  each  case,  you  can  verify  that  the  geometric 
and  algebraic  multiplicities  are  equal  for  every  eigenvalue.  This  is  the  substance  of 
the  next  theorem. 


Theorem  DMFE  Diagonalizable  Matrices  have  Full  Eigenspaces 

Suppose  A is  a square  matrix.  Then  A is  diagonalizable  if  and  only  if  "/a  (A)  = a a (A) 

for  every  eigenvalue  A of  A. 


Proof.  Suppose  A has  size  n and  k distinct  eigenvalues,  Ai,  A2,  A3,  . . . , A*,.  Let 
Si  = {x,;i,  x,:2,  x,3,  . . . , x,;74 (Ai) } , denote  a basis  for  the  eigenspace  of  A;,  £a  (A*), 
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for  1 < i < k.  Then 

S = Si  U S2  u s3  U • • • U Sk 

is  a set  of  eigenvectors  for  A.  A vector  cannot  be  an  eigenvector  for  two  different 
eigenvalues  (see  Exercise  EE.T20)  so  Sj  D Sj  = 0 whenever  i ^ j.  In  other  words,  S 
is  a disjoint  union  of  Si,  1 < i < k. 

(<=)  The  size  of  S is 

k 

i*i  = E ")A  (Aj)  S disjoint  union  of  Si 

*= l 
k 

= ^ a a (Aj)  Hypothesis 

i— 1 

= n Theorem  NEM 

We  next  show  that  S'  is  a linearly  independent  set.  So  we  will  begin  with  a relation 
of  linear  dependence  on  S,  using  doubly-subscripted  scalars  and  eigenvectors, 

0 = (anXn  + ai2x12  + • • • + ai7A(Ai)xi7A(Ai))  + 

(«21x21  + fl22x22  + • • • + a27A(A2)X27A(A2))  + 

(a3ix3i  + a32x32  + • • • + a37A(A3)x37A(A3))  + 


(afclXfcl  Ufc2xfc2  “t”  ' ' ' T H/c7A(Afc)x/c7A(Afc)) 

Define  the  vectors  y j,  1 < i < k by 

Yl  = (ailxll  + Ol2x12  + ai3x13  d + a7A(lAi)xl7A(Ai)) 

y2  = (a2ix21  + a22x22  + a23x23  H + a,1A(2\2)X-2'1A(\2)) 

y3  = (a31x31  + a32X32  + a33X33  H + «7a(3A3)x37a(A3)) 

y k = (ofclXfcl  + «fc2xfe2  + afc3Xfc3  4 + a7A(fcAft)x/c7A(Afc)) 

Then  the  relation  of  linear  dependence  becomes 

0 = yi  + y2  + y3  H b y* 

Since  the  eigenspace  £a  (A i)  is  closed  under  vector  addition  and  scalar  multipli- 
cation, y i £ £a  (Aj),  1 < i < k.  Thus,  for  each  i,  the  vector  yj  is  an  eigenvector  of 
A for  Aj,  or  is  the  zero  vector.  Recall  that  sets  of  eigenvectors  whose  eigenvalues 
are  distinct  form  a linearly  independent  set  by  Theorem  EDELI.  Should  any  (or 
some)  yi  be  nonzero,  the  previous  equation  would  provide  a nontrivial  relation  of 
linear  dependence  on  a set  of  eigenvectors  with  distinct  eigenvalues,  contradicting 
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Theorem  EDELI.  Thus  y*  = 0,  1 < i < k. 

Each  of  the  k equations,  y,  = 0,  is  a relation  of  linear  dependence  on  the 
corresponding  set  S),  a set  of  basis  vectors  for  the  eigenspace  £a  ( A*),  which  is 
therefore  linearly  independent.  From  these  relations  of  linear  dependence  on  linearly 
independent  sets  we  conclude  that  the  scalars  are  all  zero,  more  precisely,  a^-  = 0, 
1 < j < 7a  (Aj)  for  1 < i < k.  This  establishes  that  our  original  relation  of  linear 
dependence  on  S has  only  the  trivial  relation  of  linear  dependence,  and  hence  S'  is  a 
linearly  independent  set. 

We  have  determined  that  S is  a set  of  n linearly  independent  eigenvectors  for  A, 
and  so  by  Theorem  DC  is  diagonalizable. 

(=>)  Now  we  assume  that  A is  diagonalizable.  Aiming  for  a contradiction  (Proof 
Technique  CD),  suppose  that  there  is  at  least  one  eigenvalue,  say  At,  such  that 
7 a (At)  7^  aA  (At).  By  Theorem  ME  we  must  have  7 a (A t)  < ola  (At),  and  7 a (At)  < 
aA  (A i)  for  1 < i < k,  i 7^  t. 

Since  A is  diagonalizable,  Theorem  DC  guarantees  a set  of  n linearly  independent 
vectors,  all  of  which  are  eigenvectors  of  A.  Let  rii  denote  the  number  of  eigenvectors 
in  S that  are  eigenvectors  for  At,  and  recall  that  a vector  cannot  be  an  eigenvector 
for  two  different  eigenvalues  (Exercise  EE.T20).  S'  is  a linearly  independent  set,  so 
the  subset  Si  containing  the  n*  eigenvectors  for  \ must  also  be  linearly  independent. 
Because  the  eigenspace  £a  (Aj)  has  dimension  7 a (A j)  and  Sj  is  a linearly  independent 
subset  in  £a  (A.;),  Theorem  G tells  us  that  m < 7 a (A,),  for  1 < i < k. 

Putting  all  these  facts  together  gives, 


n = ni  + n2  + n3  + ■ ■ ■ + nt  + ■ ■ ■ + nk 

< 7 a (Ai)  + 7 A (A2)  + 7A  (A3)  H 1-  7a  (At)  H b 7a  (A k) 

< OLA  (Ai)  + OLA  (A2)  + OLA  (A3)  + • • • + OLA  (At)  + • • • + OLA  (A fc) 
= n 

This  is  a contradiction  (we  cannot  have  n < n!)  and  so  our 
some  eigenspace  had  less  than  full  dimension  was  false. 


Definition  SU 
Theorem  G 
Theorem  ME 
Theorem  NEM 
assumption  that 


Example  SEE,  Example  CAEHW,  Example  ESMS3,  Example  ESMS4,  Example 
DEMS5,  Archetype  B,  Archetype  F,  Archetype  K and  Archetype  L are  all  examples 
of  matrices  that  are  diagonalizable  and  that  illustrate  Theorem  DMFE.  While  we 
have  provided  many  examples  of  matrices  that  are  diagonalizable,  especially  among 
the  archetypes,  there  are  many  matrices  that  are  not  diagonalizable.  Here  is  one 
now. 

Example  NDMS4  A non-diagonalizable  matrix  of  size  4 
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In  Example  EMMS4  the  matrix 

'-2 

R-  12 

B-  6 

3 


1 -2 
1 4 

5 -2 

-4  5 


-4' 

9 

-4 

10 


was  determined  to  have  characteristic  polynomial 

pB  (x)  = (x-  l)(x  - 2)3 


and  an  eigenspace  for  A = 2 of 


£b  (2)  = 


So  the  geometric  multiplicity  of  A = 2 is  75  (2)  = 1,  while  the  algebraic  mul- 
tiplicity is  as  (2)  = 3.  By  Theorem  DMFE,  the  matrix  B is  not  diagonalizable. 
A 


Archetype  A is  the  lone  archetype  with  a square  matrix  that  is  not  diagonalizable, 
as  the  algebraic  and  geometric  multiplicities  of  the  eigenvalue  A = 0 differ.  Example 
HMEM5  is  another  example  of  a matrix  that  cannot  be  diagonalized  due  to  the 
difference  between  the  geometric  and  algebraic  multiplicities  of  A = 2,  as  is  Exam- 
ple CEMS6  which  has  two  complex  eigenvalues,  each  with  differing  multiplicities. 
Likewise,  Example  EMMS4  has  an  eigenvalue  with  different  algebraic  and  geometric 
multiplicities  and  so  cannot  be  diagonalized. 

Theorem  DED  Distinct  Eigenvalues  implies  Diagonalizable 

Suppose  A is  a square  matrix  of  size  n with  n distinct  eigenvalues.  Then  A is 

diagonalizable. 


Proof.  Let  Ai,  A2,  A3,  . . . , Xn  denote  the  n distinct  eigenvalues  of  A.  Then  by  The- 
orem NEM  we  have  n = ^”=1  aA  (Ai),  which  implies  that  ctA  (A j)  = 1,  1 < i < n. 
From  Theorem  ME  it  follows  that  7 a (Ai)  = 1,  1 < * < n.  So  ^a  (A  f)  = a a (A  *), 
1 < i < n and  Theorem  DMFE  says  A is  diagonalizable.  ■ 


Example  DEHD  Distinct  eigenvalues,  hence  diagonalizable 
In  Example  DEMS5  the  matrix 


' 15 

18 

-8 

6 

-5' 

5 

3 

1 

-1 

-3 

0 

-4 

5 

-4 

-2 

—43 

-46 

17 

-14 

15 

26 

30 

-12 

8 

-10 
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has  characteristic  polynomial 

Ph  (x)  = x(x  - 2){x  - l)(x  + l)(x  + 3) 

and  so  is  a 5 x 5 matrix  with  5 distinct  eigenvalues. 

By  Theorem  DED  we  know  H must  be  diagonalizable.  But  just  for  practice, 
we  exhibit  a diagonalization.  The  matrix  S contains  eigenvectors  of  H as  columns, 
one  from  each  eigenspace,  guaranteeing  linear  independent  columns  and  thus  the 
nonsingularity  of  S.  Notice  that  we  are  using  the  versions  of  the  eigenvectors  from 
Example  DEMS5  that  have  integer  entries.  The  diagonal  matrix  has  the  eigenvalues 
of  H in  the  same  order  that  their  respective  eigenvectors  appear  as  the  columns  of 
S.  With  these  matrices,  verify  computationally  that  S~1HS  = D. 


' 2 

1 

-1 

1 

1 ' 

'-3 

0 

0 

0 

O' 

-1 

0 

2 

0 

-1 

0 

-1 

0 

0 

0 

—2 

0 

2 

-1 

—2 

D = 

0 

0 

0 

0 

0 

- 4 

-1 

0 

-2 

-1 

0 

0 

0 

1 

0 

2 

2 

1 

2 

1 

0 

0 

0 

0 

2 

Note  that  there  are  many  different  ways  to  diagonalize  H.  We  could  replace  eigenvec- 
tors by  nonzero  scalar  multiples,  or  we  could  rearrange  the  order  of  the  eigenvectors 
as  the  columns  of  S (which  would  subsequently  reorder  the  eigenvalues  along  the 
diagonal  of  D).  A 

Archetype  B is  another  example  of  a matrix  that  has  as  many  distinct  eigenvalues 
as  its  size,  and  is  hence  diagonalizable  by  Theorem  DED. 

Powers  of  a diagonal  matrix  are  easy  to  compute,  and  when  a matrix  is  diago- 
nalizable, it  is  almost  as  easy.  We  could  state  a theorem  here  perhaps,  but  we  will 
settle  instead  for  an  example  that  makes  the  point  just  as  well. 


Example  HPDM  High  power  of  a diagonalizable  matrix 
Suppose  that 


■ 19 

0 

6 

13  ' 

-33 

-1 

-9 

-21 

21 

-4 

12 

21 

-36 

2 

-14 

-28 

and  we  wish  to  compute  A20.  Normally  this  would  require  19  matrix  multiplications, 
but  since  A is  diagonalizable,  we  can  simplify  the  computations  substantially. 

First,  we  diagonalize  A.  With 


' 1 

-1 

2 

-2 

3 

-3 

3 

1 

1 

3 

3 

-2 

1 

-4 

0 

we  find 


D = S~1AS 
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-6 

1 

—3 

-6’ 

' 19 

0 

6 

13  ' 

‘ 1 

-1 

2 

-r 

0 

2 

—2 

-3 

-33 

-1 

-9 

-21 

-2 

3 

-3 

3 

3 

0 

1 

2 

21 

-4 

12 

21 

1 

1 

3 

3 

-1 

-1 

1 

1 

-36 

2 

-14 

-28 

-2 

1 

-4 

0 

-1  0 0 O' 

0 0 0 0 

“ 0 0 2 0 

. 0 0 0 1. 

Now  we  find  an  alternate  expression  for  A20, 

A20  = AAA ...  A 

= InAInAInAIn  . . . I„.AIn 

= (SS-1)  A ( SS -1)  A ( SS _1)  A (SS-1)  . . . ( SS -1)  A (S'S’"1) 
= S (S~1AS)  (S~1AS)  (S-'AS) . . . (S-'AS)  S'”1 
= SDDD  . . . DS~l 
= SID20  S’"1 


and  since  I?  is  a diagonal  matrix,  powers  are  much  easier  to  compute, 


■-1  0 0 01 

0 0 0 0 1 

0 0 2 0 5 

. 0 0 0 1. 

-(-I)20  0 0 0 - 

0 (O)20  0 0 

0 0 (2)20  0 


L 

0 

0 

0 

(1) 

20J 

■ 1 

-1 

2 

-r 

T 

0 

0 

O' 

'-6 

1 

-3 

-6' 

-2 

3 

-3 

3 

0 

0 

0 

0 

0 

2 

-2 

-3 

1 

1 

3 

3 

0 

0 

1048576 

0 

3 

0 

1 

2 

-2 

1 

-4 

0 

0 

0 

0 

1 

-1 

-1 

1 

1 

' 6291451 

2 

2097148 

4194297  ' 

-9437175 

-5 

-3145719 

-6291441 

9437175 

-2 

3145728 

6291453 

-12582900 

-2 

-4194298 

-8388596 

Notice  how  we  effectively  replaced  the  twentieth  power  of  A by  the  twentieth 
power  of  D,  and  how  a high  power  of  a diagonal  matrix  is  just  a collection  of  powers 


of  scalars  on  the  diagonal.  The  price  we  pay  for  this  simplification  is  the  need  to 
diagonalize  the  matrix  (by  computing  eigenvalues  and  eigenvectors)  and  finding  the 
inverse  of  the  matrix  of  eigenvectors.  And  we  still  need  to  do  two  matrix  products. 
But  the  higher  the  power,  the  greater  the  savings.  A 
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Subsection  FS 
Fibonacci  Sequences 


Example  FSCF  Fibonacci  sequence,  closed  form 

The  Fibonacci  sequence  is  a sequence  of  integers  defined  recursively  by 


do  = 0 ai  = 1 


dn_f_l  — dn  + &n—\ , Ti  + 1 


So  the  initial  portion  of  the  sequence  is  0,  1,  1,  2,  3,  5,  8,  13,  21,  . . ..  In  this 
subsection  we  will  illustrate  an  application  of  eigenvalues  and  diagonalization  through 
the  determination  of  a closed-form  expression  for  an  arbitrary  term  of  this  sequence. 

To  begin,  verify  that  for  any  n > 1 the  recursive  statement  above  establishes  the 
truth  of  the  statement 


0 

i 

1 

&n+ 1 

i 

i 

Let  A denote  this  2x2  matrix.  Through  repeated  applications  of  the  statement 
above  we  have 


= A 

tin—  1 

= A2 

&n-2 

= A3 

In- 3 

= ■■■  = A71 

a0 

&n+ 1 

tin—  1 

dn- 2 

al 

In  preparation  for  working  with  this  high  power  of  A,  not  unlike  in  Example 
HPDM,  we  will  diagonalize  A.  The  characteristic  polynomial  of  A is  pa  (x)  = 
x2  — x — 1,  with  roots  (the  eigenvalues  of  A by  Theorem  EMRCP) 

l + \/5  . 1 — \/5 

p = 6 = — T- 

With  two  distinct  eigenvalues,  Theorem  DED  implies  that  A is  diagonalizable. 
It  will  be  easier  to  compute  with  these  eigenvalues  once  you  confirm  the  following 
properties  (all  but  the  last  can  be  derived  from  the  fact  that  p and  <5  are  roots  of 
the  characteristic  polynomial,  in  a factored  or  unfactored  form) 

p + S = 1 pS  = — 1 1 + p = p2  1 + 5 = S2  p—5  = V 5 


Then  eigenvectors  of  A (for  p and  5,  respectively)  are 


1 

Y 

P 

5 

which  can  be  easily  confirmed,  as  we  demonstrate  for  the  eigenvector  for  p, 


0 1‘ 

Y 

p 

' p 

i' 

1 1 

p 

1 + p 

p2 

= p 

p 

From  the  proof  of  Theorem  DC  we  know  A can  be  diagonalized  by  a matrix 
S with  these  eigenvectors  as  columns,  giving  D = S~1AS.  We  list  S,  S'-1  and  the 
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diagonal  matrix  D, 
S = 


1 1 

P * 


A"1  = 


p-6 


-S  1 
P -1 


D = 


P o 

0 (5 


OK,  we  have  everything  in  place  now.  The  main  step  in  the  following  is  to  replace 
A by  SDS~1.  Here  we  go, 


an 

On+l 


= A' 

= ( SDS _1) 

= SDS-1  SDS-1  SDS-1  • • • SDS-1 

= SDDD  ■ ■ ■ DS-1 

= SDnS~ 1 r° 
a\ 


A 

1' 

p 

0] 

n 1 

-6 

1 ' 

p 

<5 

0 

( 

5J 

p- 

s 

P 

-1 

l 

[1 

1' 

-pn 

o' 

'-5 

1 

1 

1 

o — 

V 

) 

0 , 

5n 

P 

-!J 

1 

[i 

l' 

o' 

' 1 ' 

1 

o — 

~5 

[(■ 

) 

s 

0 , 

5n 

-1 

1 

[1 

1 

V 

t 

o — 

~5 

[(■ 

) 

s 

-sn 

1 

f 

)n 

- 

sn 

t 

o — 

~5 

fl+1 

- 

6n+ 1 

Performing  the  scalar  multiplication  and  equating  the  first  entries  of  the  two 
vectors,  we  arrive  at  the  closed  form  expression 


1 

— £ 
p-6 

(, pn  - Sn) 

- 1 ( 

( l+v^y 

~ V5\ 

i 2 ) 

1 

~ 2”v/5 

Notice  that  it  does  not  matter  whether  we  use  the  equality  of  the  first  or  second 
entries  of  the  vectors,  we  will  arrive  at  the  same  formula,  once  in  terms  of  n and 
again  in  terms  of  n + 1.  Also,  our  definition  clearly  describes  a sequence  that  will 
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only  contain  integers,  yet  the  presence  of  the  irrational  number  y/5  might  make  us 
suspicious.  But  no,  our  expression  for  a”  will  always  yield  an  integer! 

The  Fibonacci  sequence,  and  generalizations  of  it,  have  been  extensively  studied 
(Fibonacci  lived  in  the  12th  and  13th  centuries).  There  are  many  ways  to  derive 
the  closed-form  expression  we  just  found,  and  our  approach  may  not  be  the  most 
efficient  route.  But  it  is  a nice  demonstration  of  how  diagonalization  can  be  used  to 
solve  a problem  outside  the  field  of  linear  algebra.  A 

We  close  this  section  with  a comment  about  an  important  upcoming  theorem  that 
we  prove  in  Chapter  R.  A consequence  of  Theorem  OD  is  that  every  Hermitian  matrix 
(Definition  HM)  is  diagonalizable  (Definition  DZM),  and  the  similarity  transformation 
that  accomplishes  the  diagonalization  uses  a unitary  matrix  (Definition  UM).  This 
means  that  for  every  Hermitian  matrix  of  size  n there  is  a basis  of  Cn  that  is 
composed  entirely  of  eigenvectors  for  the  matrix  and  also  forms  an  orthonormal  set 
(Definition  ONS).  Notice  that  for  matrices  with  only  real  entries,  we  only  need  the 
hypothesis  that  the  matrix  is  symmetric  (Definition  SYM)  to  reach  this  conclusion 
(Example  ESMS4).  Can  you  imagine  a prettier  basis  for  use  with  a matrix?  I cannot. 

These  results  in  Section  OD  explain  much  of  our  recurring  interest  in  orthogonality, 
and  make  the  section  a high  point  in  your  study  of  linear  algebra.  A precise  statement 
of  this  diagonalization  result  applies  to  a slightly  broader  class  of  matrices,  known  as 
“normal”  matrices  (Definition  NRML),  which  are  matrices  that  commute  with  their 
adjoints.  With  this  expanded  category  of  matrices,  the  result  becomes  an  equivalence 
(Proof  Technique  E).  See  Theorem  OD  and  Theorem  OBNM  in  Section  OD  for  all 
the  details. 

Reading  Questions 

1.  What  is  an  equivalence  relation? 

2.  State  a condition  that  is  equivalent  to  a matrix  being  diagonalizable,  but  is  not  the 
definition. 

3.  Find  a diagonal  matrix  similar  to 


Exercises 

C20t  Consider  the  matrix  A below.  First,  show  that  A is  diagonalizable  by  computing 
the  geometric  multiplicities  of  the  eigenvalues  and  quoting  the  relevant  theorem.  Second, 
find  a diagonal  matrix  D and  a nonsingular  matrix  S so  that  S~1AS  = D.  (See  Exercise 
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EE.C20  for  some  of  the  necessary  computations.) 


'18 

-15 

33 

-15' 

-4 

8 

-6 

6 

-9 

9 

-16 

9 

5 

-6 

9 

-4 

C21'  Determine  if  the  matrix  A below  is  diagonalizable.  If  the  matrix  is  diagonalizable, 
then  find  a diagonal  matrix  D that  is  similar  to  A,  and  provide  the  invertible  matrix  S 
that  performs  the  similarity  transformation.  You  should  use  your  calculator  to  find  the 
eigenvalues  of  the  matrix,  but  try  only  using  the  row-reducing  function  of  your  calculator 
to  assist  with  finding  eigenvectors. 

‘ 1 9 9 24  ' 

-3  -27  -29  -68 

A~  1 11  13  26 

1 7 7 18 


C22'  Consider  the  matrix  A below.  Find  the  eigenvalues  of  A using  a calculator  and  use 
these  to  construct  the  characteristic  polynomial  of  A,  pa  ( x ).  State  the  algebraic  multiplicity 
of  each  eigenvalue.  Find  all  of  the  eigenspaces  for  A by  computing  expressions  for  null 
spaces,  only  using  your  calculator  to  row-reduce  matrices.  State  the  geometric  multiplicity 
of  each  eigenvalue.  Is  A diagonalizable?  If  not,  explain  why.  If  so,  find  a diagonal  matrix  D 
that  is  similar  to  A. 


' 19 

25 

30 

5 ' 

-23 

-30 

-35 

-5 

7 

9 

10 

1 

-3 

-4 

-5 

-1 

TIS1^  Suppose  that  A and  B are  similar  matrices  of  size  n.  Prove  that  A3  and  B 3 are 
similar  matrices.  Generalize. 

1716^  Suppose  that  A and  B are  similar  matrices,  with  A nonsingular.  Prove  that  B is 
nonsingular,  and  that  A-1  is  similar  to  B . 

T171  Suppose  that  B is  a nonsingular  matrix.  Prove  that  AB  is  similar  to  BA. 


Chapter  LT 

Linear  Transformations 


In  the  next  linear  algebra  course  you  take,  the  first  lecture  might  be  a reminder  about 
what  a vector  space  is  (Definition  VS),  their  ten  properties,  basic  theorems  and  then 
some  examples.  The  second  lecture  would  likely  be  all  about  linear  transformations. 
While  it  may  seem  we  have  waited  a long  time  to  present  what  must  be  a central 
topic,  in  truth  we  have  already  been  working  with  linear  transformations  for  some 
time. 

Functions  are  important  objects  in  the  study  of  calculus,  but  have  been  absent 
from  this  course  until  now  (well,  not  really,  it  just  seems  that  way).  In  your  study  of 
more  advanced  mathematics  it  is  nearly  impossible  to  escape  the  use  of  functions  — 
they  are  as  fundamental  as  sets  are. 


Section  LT 

Linear  Transformations 

Early  in  Chapter  VS  we  prefaced  the  definition  of  a vector  space  with  the  comment 
that  it  was  “one  of  the  two  most  important  definitions  in  the  entire  course.”  Here 
comes  the  other.  Any  capsule  summary  of  linear  algebra  would  have  to  describe  the 
subject  as  the  interplay  of  linear  transformations  and  vector  spaces.  Here  we  go. 

Subsection  LT 
Linear  Transformations 

Definition  LT  Linear  Transformation 

A linear  transformation,  T:  U — > V,  is  a function  that  carries  elements  of  the 
vector  space  U (called  the  domain)  to  the  vector  space  V (called  the  codomain), 
and  which  has  two  additional  properties 

1.  T (ui  + u2)  = T (ui)  + T (u2)  for  all  ui,  u2  e U 
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2.  T (cm)  = aT  (u)  for  all  u € U and  all  a € C 


□ 

The  two  defining  conditions  in  the  definition  of  a linear  transformation  should 
“feel  linear,”  whatever  that  means.  Conversely,  these  two  conditions  could  be  taken  as 
exactly  what  it  means  to  be  linear.  As  every  vector  space  property  derives  from  vector 
addition  and  scalar  multiplication,  so  too,  every  property  of  a linear  transformation 
derives  from  these  two  defining  properties.  While  these  conditions  may  be  reminiscent 
of  how  we  test  subspaces,  they  really  are  quite  different,  so  do  not  confuse  the  two. 

Here  are  two  diagrams  that  convey  the  essence  of  the  two  defining  properties 
of  a linear  transformation.  In  each  case,  begin  in  the  upper  left-hand  corner,  and 
follow  the  arrows  around  the  rectangle  to  the  lower-right  hand  corner,  taking  two 
different  routes  and  doing  the  indicated  operations  labeled  on  the  arrows.  There  are 
two  results  there.  For  a linear  transformation  these  two  expressions  are  always  equal. 


T 

ui,  u2 


* T{ ui),  T(u2) 


+ 


+ 


T rp 

Ui  + u2  


* T( ui  + u2)  = T( ui)  + T{ u2) 


Diagram  DLTA:  Definition  of  Linear  Transformation,  Additive 

T 


u 


T(u) 


a 


T 


a 


au 


T (ctu)  = aT  (u) 


Diagram  DLTM:  Definition  of  Linear  Transformation,  Multiplicative 

A couple  of  words  about  notation.  T is  the  name  of  the  linear  transformation, 
and  should  be  used  when  we  want  to  discuss  the  function  as  a whole.  T (u)  is  how  we 
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talk  about  the  output  of  the  function,  it  is  a vector  in  the  vector  space  V.  When  we 
write  T (x  + y)  = T (x)  + T (y),  the  plus  sign  on  the  left  is  the  operation  of  vector 
addition  in  the  vector  space  U,  since  x and  y are  elements  of  U.  The  plus  sign  on 
the  right  is  the  operation  of  vector  addition  in  the  vector  space  V,  since  T (x)  and 
T (y)  are  elements  of  the  vector  space  V.  These  two  instances  of  vector  addition 
might  be  wildly  different. 

Let  us  examine  several  examples  and  begin  to  form  a catalog  of  known  linear 
transformations  to  work  with. 


Example  ALT  A linear  transformation 

Define  T : C3  — > C2  by  describing  the  output  of  the  function  for  a generic  input  with 
the  formula 


and  check  the  two  defining  properties. 


2xi  + x3 
-4^2 


and 


( 

~X\ 

2/1 

T(x  + y)  = T 

X2 

+ 

2/2 

V 

A3. 

.2/3. 

( 

~xi  + yi 

= T{ 

X2  + 2/2 
A 3 + 2/3. 

1) 

2xi  + x3 
-4x2 


( 

'Xl" 

\ ( 

Vi 

T 

X2 

+ T 

2/2 

V 

A3. 

J \ 

.2/3. 

T (ax)  = T 


= T 


( 

Xi 

I a 

X2 

\ 

A3. 

axi 
aX2 
ax  3. 
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2{ax\)  + (ax3) 
-4(ax2) 

a{2x\  + x3) 
a(-4x2) 


= a 


2xi  + x3 
-4x2 


= aT 


= aT  (x) 


So  by  Definition  LT,  T is  a linear  transformation.  A 

It  can  be  just  as  instructive  to  look  at  functions  that  are  not  linear  transformations. 
Since  the  defining  conditions  must  be  true  for  all  vectors  and  scalars,  it  is  enough  to 
find  just  one  situation  where  the  properties  fail. 


Example  NLT  Not  a linear  transformation 
Define  S:  C3  -HC3  by 


/ 

At' 

\ 

4ar  + 2a;2 

X2 

) = 

0 

V 

x3_ 

J 

_x\  + 3x3  — 2_ 

This  function  “looks”  linear,  but  consider 


( 

T 

'8' 

-24" 

3 S 

2 

) =3 

0 

= 

0 

\ 

3 

/ 

8 

24 

while 


T 

\ ( 

'3' 

\ 

'24' 

2 

6 

= 

0 

3 

J V 

9 

J 

28 

So  the  second  required  property  fails  for  the  choice  of  a = 3 and  x = 


T 

2 

3 


and  by 


Definition  LT,  S is  not  a linear  transformation.  It  is  just  about  as  easy  to  find  an 
example  where  the  first  defining  property  fails  (try  it!).  Notice  that  it  is  the  “-2”  in 
the  third  component  of  the  definition  of  S that  prevents  the  function  from  being  a 
linear  transformation.  A 


Example  LTPM  Linear  transformation,  polynomials  to  matrices 
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Define  a linear  transformation  T : P3  — >•  M22  by 


T (a  + bx  + cx2  + dx3) 


a + b a — 2c 
d b-d 


We  verify  the  two  defining  conditions  of  a linear  transformation. 


T (x  + y)  =T  ((ai  + bix  + C\X2  + dix3)  + (a2  + b2x  + c2x2  + d2x3)) 

= T ((cii  + a2)  A {bi  + b2)x  + (ci  + c2)x2  + (di  + d2)x3^ 

_ (ai  + 0,2)  + ( bi  + b2 ) (ai  + a2)  — 2(ci  + c2) 
di  + d2  (&i  + b2)  — (di  + d2) 

(ai  + bi)  + (a2  + 62)  (di  — 2ci)  + (a2  — 2c2) 
d\  + d2  (61  — d\)  + (&2  — d2) 


ai  + 61 
di 


di  — 2ci 

b\  — d± 


a2  + b2 
d2 


a2  - 2c2 
62  — d2 


= T (ai  + bix  + cix2  + dix3)  + T (a2  + &2x  + c2x2  + d2x3) 
= T(x)+T(y) 


and 


T (ax)  = T (a(a  + bx  + cx2  + dx3)) 

= T ((aa)  + ( ab)x  + (ac)x2  + (ad)x3) 

(aa)  + ( ab ) (aa)  — 2(ac) 
ad  (ab)  — (ad) 

_ a(a  + b)  a(a  — 2c) 
ad  a(b  — d) 

a + b a — 2c 
~a[  d b-d 

= aT  (a  + bx  + cx2  + dx3) 

= aT  (x) 

So  by  Definition  LT,  T is  a linear  transformation.  A 

Example  LTPP  Linear  transformation,  polynomials  to  polynomials 
Define  a function  S : P4  — > P5  by 

S(p(x))  = (x  - 2 )p(x) 

Then 

S (p(x)  + q(x))  = (x-  2 )(p(x)  + q(x)) 

= (x  ~ 2 )p(x)  + (x  - 2 )q(x)  = S (p(x))  + S (q(x)) 

S (ap(x))  = (x  — 2 )(ap(x))  = (x  — 2 )ap(x)  = a(x  — 2 )p(x)  = aS  (p(x)) 
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So  by  Definition  LT,  S'  is  a linear  transformation.  A 

Linear  transformations  have  many  amazing  properties,  which  we  will  investigate 
through  the  next  few  sections.  However,  as  a taste  of  things  to  come,  here  is  a 
theorem  we  can  prove  now  and  put  to  use  immediately. 

Theorem  LTTZZ  Linear  Transformations  Take  Zero  to  Zero 
Suppose  T : U -A  V is  a linear  transformation.  Then  T (0)  = 0. 

Proof.  The  two  zero  vectors  in  the  conclusion  of  the  theorem  are  different.  The  first 
is  from  U while  the  second  is  from  V.  We  will  subscript  the  zero  vectors  in  this  proof 
to  highlight  the  distinction.  Think  about  your  objects.  (This  proof  is  contributed  by 
Mark  Shoemaker). 

T (Ou)  = T (OOc;)  Theorem  ZSSM  in  U 

= 0 T (0(7 ) Definition  LT 

= 0 y Theorem  ZSSM  in  V 


Return  to  Example  NLT  and  compute  S 


to  quickly  see  again 


that  S is  not  a linear  transformation,  while  in  Example  LTPM  compute 


S (0  + Ox  + Ox2  + Ox3) 


0 0 
0 0 


as  an  example  of  Theorem  LTTZZ  at  work. 


Subsection  LTC 

Linear  Transformation  Cartoons 

Throughout  this  chapter,  and  Chapter  R,  we  will  include  drawings  of  linear  transfor- 
mations. We  will  call  them  “cartoons,”  not  because  they  are  humorous,  but  because 
they  will  only  expose  a portion  of  the  truth.  A Bugs  Bunny  cartoon  might  give  us 
some  insights  on  human  nature,  but  the  rules  of  physics  and  biology  are  routinely 
(and  grossly)  violated.  So  it  will  be  with  our  linear  transformation  cartoons. 
Here  is  our  first,  followed  by  a guide  to  help  you  understand  how  these  are  meant 
to  describe  fundamental  truths  about  linear  transformations,  while  simultaneously 
violating  other  truths. 
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Diagram  GLT:  General  Linear  Transformation 

Here  we  picture  a linear  transformation  T : U — ► V , where  this  information  will 
be  consistently  displayed  along  the  bottom  edge.  The  ovals  are  meant  to  represent 
the  vector  spaces,  in  this  case  U,  the  domain,  on  the  left  and  V,  the  codomain,  on 
the  right.  Of  course,  vector  spaces  are  typically  infinite  sets,  so  you  will  have  to 
imagine  that  characteristic  of  these  sets.  A small  dot  inside  of  an  oval  will  represent 
a vector  within  that  vector  space,  sometimes  with  a name,  sometimes  not  (in  this 
case  every  vector  has  a name).  The  sizes  of  the  ovals  are  meant  to  be  proportional 
to  the  dimensions  of  the  vector  spaces.  However,  when  we  make  no  assumptions 
about  the  dimensions,  we  will  draw  the  ovals  as  the  same  size,  as  we  have  done  here 
(which  is  not  meant  to  suggest  that  the  dimensions  have  to  be  equal). 

To  convey  that  the  linear  transformation  associates  a certain  input  with  a certain 
output,  we  will  draw  an  arrow  from  the  input  to  the  output.  So,  for  example, 
in  this  cartoon  we  suggest  that  T (x)  = y.  Nothing  in  the  definition  of  a linear 
transformation  prevents  two  different  inputs  being  sent  to  the  same  output  and  we  see 
this  in  T (u)  = v = T (w).  Similarly,  an  output  may  not  have  any  input  being  sent 
its  way,  as  illustrated  by  no  arrow  pointing  at  t.  In  this  cartoon,  we  have  captured  the 
essence  of  our  one  general  theorem  about  linear  transformations,  Theorem  LTTZZ, 
T (Oy)  = 0 v On  occasion  we  might  include  this  basic  fact  when  it  is  relevant,  at 
other  times  maybe  not.  Note  that  the  definition  of  a linear  transformation  requires 
that  it  be  a function,  so  every  element  of  the  domain  should  be  associated  with  some 
element  of  the  codomain.  This  will  be  reflected  by  never  having  an  element  of  the 
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domain  without  an  arrow  originating  there. 

These  cartoons  are  of  course  no  substitute  for  careful  definitions  and  proofs,  but 
they  can  be  a handy  way  to  think  about  the  various  properties  we  will  be  studying. 


Subsection  MLT 

Matrices  and  Linear  Transformations 


If  you  give  me  a matrix,  then  I can  quickly  build  you  a linear  transformation.  Always. 
First  a motivating  example  and  then  the  theorem. 


Example  LTM  Linear  transformation  from  a matrix 
Let 


A = 


'3 

2 

1 


-18  1 ' 

0 5-2 

1 3 -7 


and  define  a function  P : C4  — > C3  by 


P (x)  = Ax 


So  we  are  using  an  old  friend,  the  matrix-vector  product  (Definition  MVP)  as 
a way  to  convert  a vector  with  4 components  into  a vector  with  3 components. 
Applying  Definition  MVP  allows  us  to  write  the  defining  formula  for  P in  a slightly 
different  form, 


[3 

-1 

8 

1 1 

Xi 

ran 

r-11 

rsi 

[ 1 1 

Ax  = 

2 

0 

5 

-2 

X2 

= X! 

2 

+ X2 

0 

+ £3 

5 

+ x4 

-2 

1 

1 

3 

-7 

X3 

1 

1 

3 

-7 

\_X4J 

So  we  recognize  the  action  of  the  function  P as  using  the  components  of  the 
vector  {x\,  x’2 , x3,  X4)  as  scalars  to  form  the  output  of  P as  a linear  combination 
of  the  four  columns  of  the  matrix  A,  which  are  all  members  of  C3,  so  the  result  is 
a vector  in  C3.  We  can  rearrange  this  expression  further,  using  our  definitions  of 
operations  in  C3  (Section  VO). 


P (x)  = Ax 


Xi 

CO  CN 
1 

+ X2 

-r 

0 

+ 2:3 

"8‘ 

5 

+ x4 

- 1 ‘ 

-2 

1 

1 

3 

-7 

3xi 

-X2 

8x3 

X4 

2xi 

+ 

0 

+ 

5x3 

+ 

— 2x4 

.Xl  . 

. x2  . 

3x3. 

.—  7x4. 

'3xi  — X2  + 8x3  + X4 

2xi  + 5x3  — 2x4 
xi  + x2  + 3x3  - 7x4. 


Definition  of  P 
Definition  MVP 


Definition  CVSM 


Definition  CVA 
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You  might  recognize  this  final  expression  as  being  similar  in  style  to  some  previous 
examples  (Example  ALT)  and  some  linear  transformations  defined  in  the  archetypes 
(Archetype  M through  Archetype  R).  But  the  expression  that  says  the  output  of 
this  linear  transformation  is  a linear  combination  of  the  columns  of  A is  probably 
the  most  powerful  way  of  thinking  about  examples  of  this  type. 

Almost  forgot  — we  should  verify  that  P is  indeed  a linear  transformation.  This 
is  easy  with  two  matrix  properties  from  Section  MM. 


P (x  + y)  = A (x  + y) 

Definition  of  P 

= Ax  + Ay 

Theorem  MMDAA 

= P(x)  + P(y) 

Definition  of  P 

and 

P (ax)  = A (ax) 

Definition  of  P 

= a (Ax) 

Theorem  MMSMM 

= aP  (x) 

Definition  of  P 

So  by  Definition  LT,  P is  a linear  transformation.  A 

So  the  multiplication  of  a vector  by  a matrix  “transforms”  the  input  vector  into 
an  output  vector,  possibly  of  a different  size,  by  performing  a linear  combination. 
And  this  transformation  happens  in  a “linear”  fashion.  This  “functional”  view  of 
the  matrix-vector  product  is  the  most  important  shift  you  can  make  right  now  in 
how  you  think  about  linear  algebra.  Here  is  the  theorem,  whose  proof  is  very  nearly 
an  exact  copy  of  the  verification  in  the  last  example. 

Theorem  MBLT  Matrices  Build  Linear  Transformations 

Suppose  that  A is  an  m x n matrix.  Define  a function  T : Cn  —>  Cm  by  T (x)  = Ax. 
Then  T is  a linear  transformation. 

Proof. 

T{x  + y)  = A(x  + y) 

Definition  of  T 

= Ax  + Ay 

Theorem  MMDAA 

= T(x)+T(y) 

Definition  of  T 

and 

T (ax)  = A (ax) 

Definition  of  T 

= a (Ax) 

Theorem  MMSMM 

= aT  (x) 

Definition  of  T 

So  by  Definition  LT,  T is  a linear  transformation.  ■ 


So  Theorem  MBLT  gives  us  a rapid  way  to  construct  linear  transformations. 
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Grab  an  m x n matrix  A,  define  T (x)  = Ax  and  Theorem  MBLT  tells  us  that  T is 
a linear  transformation  from  C"  to  Cm,  without  any  further  checking. 

We  can  turn  Theorem  MBLT  around.  You  give  me  a linear  transformation  and  I 
will  give  you  a matrix. 


Example  MFLT  Matrix  from  a linear  transformation 
Define  the  function  R : C3  — > C4  by 


'2xi  — 3x2  + 4^3" 

'Xl 

\ 

Xl  + X2  + X3 

X2 

= 

- X\  + 5x2  — 3x3 

-X3. 

/ 

x2  - 4x3 

\ 

~2xi  — 3x2  + 4x3~ 

f 

Xl 

\ 

Xl  + x2  + x3 

[ 

X2 

- 

—xi  + 5x2  — 3x3 

\ 

.X3. 

/ 

x2  - 4x3 

R 


You  could  verify  that  R is  a linear  transformation  by  applying  the  definition,  but 
we  will  instead  massage  the  expression  defining  a typical  output  until  we  recognize 
the  form  of  a known  class  of  linear  transformations. 


R 


Definition  CVA 


Definition  CVSM 


Definition  MVP 


2x\ 

x\ 

-Xl 

0 


~-3x2 

CO 

+ 

x2 

5x2 

+ 

X3 

—3x3 

. x2  _ 

_ 4x3_ 

Xl 

' 2 ' 
1 

-1 

+ x2 

3' 

1 

5 

+ x3 

' 4 ' 
1 

-3 

0 

1 

—4 

' 2 
1 

-1 

0 


-3 

1 

5 

1 


4 

1 

—3 

-4 


Xl 

X2 

X3_ 

So  if  we  define  the  matrix 


■ 2 

—3 

4 ' 

1 

1 

1 

-1 

5 

-3 

0 

1 

-4 

B = 


then  R (x)  = Bx.  By  Theorem  MBLT,  we  can  easily  recognize  R as  a linear 
transformation  since  it  has  the  form  described  in  the  hypothesis  of  the  theorem.  A 


Example  MFLT  was  not  an  accident.  Consider  any  one  of  the  archetypes  where 
both  the  domain  and  codomain  are  sets  of  column  vectors  (Archetype  M through 
Archetype  R)  and  you  should  be  able  to  mimic  the  previous  example.  Here  is  the 
theorem,  which  is  notable  since  it  is  our  first  occasion  to  use  the  full  power  of  the 
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defining  properties  of  a linear  transformation  when  our  hypothesis  includes  a linear 
transformation. 

Theorem  MLTCV  Matrix  of  a Linear  Transformation,  Column  Vectors 
Suppose  that  T : Cn  —¥  Cm  is  a linear  transformation.  Then  there  is  an  mx  n matrix 
A such  that  T (x)  = Ax. 


Proof.  The  conclusion  says  a certain  matrix  exists.  What  better  way  to  prove 
something  exists  than  to  actually  build  it?  So  our  proof  will  be  constructive  (Proof 
Technique  C),  and  the  procedure  that  we  will  use  abstractly  in  the  proof  can  be 
used  concretely  in  specific  examples. 

Let  e1;  e2,  e3,  . . . , e„  be  the  columns  of  the  identity  matrix  of  size  n,  In  (Defi- 
nition SUV).  Evaluate  the  linear  transformation  T with  each  of  these  standard  unit 
vectors  as  an  input,  and  record  the  result.  In  other  words,  define  n vectors  in  Cm, 
A j,  1 < * < n by 

Ai  = T (ef) 


Then  package  up  these  vectors  as  the  columns  of  a matrix 


A — [A1IA2IA3I  . . . |An] 


Does  A have  the  desired  properties?  First,  A is  clearly  an  m x n matrix.  Then 


T (x)  = T (Jnx) 

= T ([ei|e2|e3| . . . |e„]  x) 

= T (Ml  el  + M2  e2  + Ms  e3  H + Mn  en ) 

= T (Ml  e0  + T (M2  e2)  + T (Ms  e3)  + • ■ ■ + T ([x]n  en) 

= Mi  T (ei)  + M2  T (e2)  + Ms  T (e3)  ”1 + Mn  T (en) 

= Mi  A1  + M2  A2  + M3  A3  H b Mn  A« 

= Ax 


Theorem  MMIM 
Definition  SUV 
Definition  MVP 
Definition  LT 
Definition  LT 
Definition  of  A,; 
Definition  MVP 


as  desired. 


So  if  we  were  to  restrict  our  study  of  linear  transformations  to  those  where  the 
domain  and  codomain  are  both  vector  spaces  of  column  vectors  (Definition  VSCV), 
every  matrix  leads  to  a linear  transformation  of  this  type  (Theorem  MBLT),  while 
every  such  linear  transformation  leads  to  a matrix  (Theorem  MLTCV).  So  matrices 
and  linear  transformations  are  fundamentally  the  same.  We  call  the  matrix  A of 
Theorem  MLTCV  the  matrix  representation  of  T. 

We  have  defined  linear  transformations  for  more  general  vector  spaces  than  just 
Cm.  Can  we  extend  this  correspondence  between  linear  transformations  and  matrices 
to  more  general  linear  transformations  (more  general  domains  and  codomains)?  Yes, 
and  this  is  the  main  theme  of  Chapter  R.  Stay  tuned.  For  now,  let  us  illustrate 
Theorem  MLTCV  with  an  example. 
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Example  MOLT  Matrix  of  a linear  transformation 
Suppose  S : C3  — » C4  is  defined  by 


Then 


so  define 


_ _ 

\ 

'3xi  — 2x2  + 5x3~ 

X\ 

\ 

X\  + x2  + x3 

X2 

= 

9x\  — 2^2  + 5x3 

X3. 

/ 

4X2 

C 1 = S(e1)  = S 


C2  = S (e2)  = S 


C3  = S(e3)  = S 


C=[C1\C2\C3\  = 


■3 

1 

9 

0 


-2  5' 
1 1 
-2  5 
4 0 


and  Theorem  MLTCV  guarantees  that  S (x)  = Cx. 

2 

As  an  illuminating  exercise,  let  z = 


-3 

3 


and  compute  S (z)  two  different  ways. 


First,  return  to  the  definition  of  S and  evaluate  S (z)  directly.  Then  do  the  matrix- 


vector  product  Cz.  In  both  cases  you  should  obtain  the  vector  S (z) 

A 


' 27  ' 
2 

39 

-12 


Subsection  LTLC 

Linear  Transformations  and  Linear  Combinations 

It  is  the  interaction  between  linear  transformations  and  linear  combinations  that  lies 
at  the  heart  of  many  of  the  important  theorems  of  linear  algebra.  The  next  theorem 
distills  the  essence  of  this.  The  proof  is  not  deep,  the  result  is  hardly  startling, 
but  it  will  be  referenced  frequently.  We  have  already  passed  by  one  occasion  to 
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employ  it,  in  the  proof  of  Theorem  MLTCV.  Paraphrasing,  this  theorem  says  that 
we  can  “push”  linear  transformations  “down  into”  linear  combinations,  or  “pull” 
linear  transformations  “up  out”  of  linear  combinations.  We  will  have  opportunities 
to  both  push  and  pull. 

Theorem  LTLC  Linear  Transformations  and  Linear  Combinations 

Suppose  that  T:  U -A  V is  a linear  transformation,  Ui,  U2,  113,  . . . , ut  are  vectors 

from  U and  aq,  02,  <23,  . . . , at  are  scalars  from  C.  Then 

T (aiUi  + a2u2  + a3u3  H b at ut)  = a±T  (u i)+a2T  (u2)+a3T  (u3)H batT  (ut) 

Proof. 

T (aiUi  + a2u2  + a3u3  H b atut ) 

= T (aiUi)  + T (a2u2)  + T (a3u3)  H + T (atut)  Definition  LT 

= a\T  (ui)  + a2T  (u2)  + 03 T (u3)  + • • • + atT  (ut)  Definition  LT 


Some  authors,  especially  in  more  advanced  texts,  take  the  conclusion  of  Theorem 
LTLC  as  the  defining  condition  of  a linear  transformation.  This  has  the  appeal  of 
being  a single  condition,  rather  than  the  two-part  condition  of  Definition  LT.  (See 
Exercise  LT.T20). 

Our  next  theorem  says,  informally,  that  it  is  enough  to  know  how  a linear 
transformation  behaves  for  inputs  from  any  basis  of  the  domain,  and  all  the  other 
outputs  are  described  by  a linear  combination  of  these  few  values.  Again,  the 
statement  of  the  theorem,  and  its  proof,  are  not  remarkable,  but  the  insight  that 
goes  along  with  it  is  very  fundamental. 

Theorem  LTDB  Linear  Transformation  Defined  on  a Basis 

Suppose  U is  a vector  space  with  basis  B = {u1;  u2,  u3,  . . . , u„}  and  the  vector 
space  V contains  the  vectors  vy,  v2,  V3,  . . . , v„  (which  may  not  be  distinct).  Then 
there  is  a unique  linear  transformation,  T : U — > V,  such  that  T (uj)  = v^,  1 < i < n. 

Proof.  To  prove  the  existence  of  T , we  construct  a function  and  show  that  it  is  a 
linear  transformation  (Proof  Technique  C).  Suppose  w £ U is  an  arbitrary  element 
of  the  domain.  Then  by  Theorem  VRRB  there  are  unique  scalars  a\,  a2,  (13,  . . . , an 
such  that 


w — cqui  + d2U2  + (23U3  + • • • + anun 

Then  define  the  function  T by 

T (w)  = aivi  + a2v2  + a3v3  H b a„vn 

It  should  be  clear  that  T behaves  as  required  for  n inputs  from  B.  Since  the  scalars 
provided  by  Theorem  VRRB  are  unique,  there  is  no  ambiguity  in  this  definition, 
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and  T qualifies  as  a function  with  domain  U and  codomain  V (i.e.  T is  well-defined). 
But  is  T a linear  transformation  as  well? 

Let  x € U be  a second  element  of  the  domain,  and  suppose  the  scalars  provided 
by  Theorem  VRRB  (relative  to  B)  are  &i,  &2,  b3,  . . . , bn.  Then 

T (w  + x)  = T (aiui  H b anu„  + &1U1  H b bn u„) 

= T ((a i + 61)  U!  -| + (a„  + bn)  u„)  Definition  VS 

= (ai  + 61 ) Vi  + • • • + (an  + bn)  v„  Definition  of  T 

= aivi  + • • • + a„v„  + b±vi  + • • • + bnv.n  Definition  VS 

= T(w)+T(x) 


Let  agCbe  any  scalar.  Then 

T (aw)  =T(ct  (aiUi  + a2u2  + a3u3  -| b anu„)) 

= T (aa3Ui  + aa2u2  + aa3u3  + • • • + aa„u„)  Definition  VS 
= aaiVi  + aa2v2  + aa3v3  + • • • + aanvn  Definition  of  T 

= a (aiVi  + a2v2  + a3v3  -| + anvn ) Definition  VS 

= aT  (w) 


So  by  Definition  LT,  T is  a linear  transformation. 

Is  T unique  (among  all  linear  transformations  that  take  the  u,  to  the  v,)? 
Applying  Proof  Technique  U,  we  posit  the  existence  of  a second  linear  transformation, 
S : U — ^ V such  that  S (u,  ) = Vj,  1 < * < n.  Again,  let  w £ U represent  an  arbitrary 
element  of  U and  let  a3,  a2,  a3,  . . . , an  be  the  scalars  provided  by  Theorem  VRRB 
(relative  to  B).  We  have, 


T (w)  — T (dill!  + a2u2  + a3u3  + • • • + anu„) 

= a\T  (ui)  + a2T  (u2)  + a3T  (u3)  + • • • + anT  (u„) 

= aivi  + a2v2  + a3v3  H 1-  anvn 

= aiS  (ui)  + a2S  (u2 ) + a3S  (u3)  H 1-  anS  (u„) 

= S (aiU!  + a2u2  + a3u3  + • • • + an u„) 

= S (w) 


Theorem  VRRB 
Theorem  LTLC 
Definition  of  T 
Definition  of  S 
Theorem  LTLC 
Theorem  VRRB 


So  the  output  of  T and  S agree  on  every  input,  which  means  they  are  equal  as 
functions,  T = S.  So  T is  unique.  ■ 


You  might  recall  facts  from  analytic  geometry,  such  as  “any  two  points  determine 
a line”  and  “any  three  non-collinear  points  determine  a parabola.”  Theorem  LTDB 
has  much  of  the  same  feel.  By  specifying  the  n outputs  for  inputs  from  a basis,  an 
entire  linear  transformation  is  determined.  The  analogy  is  not  perfect,  but  the  style 
of  these  facts  are  not  very  dissimilar  from  Theorem  LTDB. 

Notice  that  the  statement  of  Theorem  LTDB  asserts  the  existence  of  a linear 
transformation  with  certain  properties,  while  the  proof  shows  us  exactly  how  to  define 
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the  desired  linear  transformation.  The  next  two  examples  show  how  to  compute 
values  of  linear  transformations  that  we  create  this  way. 


Example  LTDB1  Linear  transformation  defined  on  a basis 

Consider  the  linear  transformation  T : C3  — )■  C2  that  is  required  to  have  the  following 
three  values, 


is  a basis  for  C3  (Theorem  SUVB),  Theorem  LTDB  says  there  is  a unique  linear 
transformation  T that  behaves  this  way. 

How  do  we  compute  other  values  of  T1  Consider  the  input 


' 2 ' 

T 

'O' 

■O' 

-3 

. 1 . 

= (2) 

0 

0 

+ (—3) 

1 

0 

+ (1) 

0 

1 

Then 

T(w)  = (2) 
Doing  it  again, 


+ (—3) 


-1 

4 


(1) 


13 

-10 


L J 

L J 

L J 

L 

' 5 ' 

T 

'O' 

'O' 

X = 

2 

= (5) 

0 

+ (2) 

1 

+ (—3) 

0 

-3 

0 

0 

1 

so 


T(x)  = (5) 


+ (2) 


-1 

4 


(-3) 


-10 

13 


Any  other  value  of  T could  be  computed  in  a similar  manner.  So  rather  than 
being  given  a formula  for  the  outputs  of  T,  the  requirement  that  T behave  in  a 
certain  way  for  the  inputs  chosen  from  a basis  of  the  domain,  is  as  sufficient  as  a 
formula  for  computing  any  value  of  the  function.  You  might  notice  some  parallels 
between  this  example  and  Example  MOLT  or  Theorem  MLTCV.  A 


Example  LTDB2  Linear  transformation  defined  on  a basis 
Consider  the  linear  transformation  R:  C3  — > C2  with  the  three  values, 
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You  can  check  that 


is  a basis  for  C3  (make  the  vectors  the  columns  of  a square  matrix  and  check  that 
the  matrix  is  nonsingular,  Theorem  CNMB).  By  Theorem  LTDB  we  know  there 
is  a unique  linear  transformation  R with  the  three  specified  outputs.  However,  we 
have  to  work  just  a bit  harder  to  take  an  input  vector  and  express  it  as  a linear 
combination  of  the  vectors  in  D. 

For  example,  consider, 


y = 


Then  we  must  first  write  y as  a linear  combination  of  the  vectors  in  D and  solve 
for  the  unknown  scalars,  to  arrive  at 


' 8 ' 

T 

-r 

-3- 

y = 

-3 

5 

= (3) 

2 

1 

+ (-2) 

5 

1 

+ (i) 

1 

4 

Then  the  proof  of  Theorem  LTDB  gives  us 


y)  = (3) 


Any  other  value  of  R could  be  computed  in  a similar  manner. 


A 


Here  is  a third  example  of  a linear  transformation  defined  by  its  action  on  a 
basis,  only  with  more  abstract  vector  spaces  involved. 


Example  LTDB3  Linear  transformation  defined  on  a basis 

The  set  W = {p(x)  £ P3  | p(l)  = 0,p(3)  = 0}  C P3  is  a subspace  of  the  vector  space 

of  polynomials  P3.  This  subspace  has  C = {3  — Ax  + x2 , 12  — 13a;  + a;3}  as  a basis 

(check  this!).  Suppose  we  consider  the  linear  transformation  S : P3  — > M2 2 with 

values 


S (3  4a;  T x 2) 


1 -3 

2 0 


S (12  - 13a;  + a:3) 


0 1 
1 0 


By  Theorem  LTDB  we  know  there  is  a unique  linear  transformation  with  these 
two  values.  To  illustrate  a sample  computation  of  S,  consider  q(x ) = 9— 6a;— 5a;2 + 2a;3. 
Verify  that  q(x)  is  an  element  of  W (does  it  have  roots  at  x = 1 and  x = 3?),  then 
find  the  scalars  needed  to  write  it  as  a linear  combination  of  the  basis  vectors  in  C. 
Because 


q(x)  = 9 — 6x  — 5x2  + 2x3  = (— 5)(3  — 4a:  + x2)  + (2)(12  — 13a;  + x3) 
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The  proof  of  Theorem  LTDB  gives  us 


1 -3' 

0 1 

—5  17 

(-5) 

2 0 

+ (2) 

1 0 

— 

-8  0 

And  all  the  other  outputs  of  S could  be  computed  in  the  same  manner.  Every 
output  of  S will  have  a zero  in  the  second  row,  second  column.  Can  you  see  why 
this  is  so?  A 


Informally,  we  can  describe  Theorem  LTDB  by  saying  “it  is  enough  to  know 
what  a linear  transformation  does  to  a basis  (of  the  domain).” 


Subsection  PI 
Pre-Images 


The  definition  of  a function  requires  that  for  each  input  in  the  domain  there  is 
exactly  one  output  in  the  codomain.  However,  the  correspondence  does  not  have 
to  behave  the  other  way  around.  An  output  from  the  codomain  could  have  many 
different  inputs  from  the  domain  which  the  transformation  sends  to  that  output, 
or  there  could  be  no  inputs  at  all  which  the  transformation  sends  to  that  output. 
To  formalize  our  discussion  of  this  aspect  of  linear  transformations,  we  define  the 
pre- image. 


Definition  PI  Pre-Image 

Suppose  that  T : U — > V is  a linear  transformation.  For  each  v,  define  the  pre-image 
of  v to  be  the  subset  of  U given  by 

r1  (v)  = {u  £ U\T  (u)  = v} 


□ 


In  other  words,  T 1 (v)  is  the  set  of  all  those  vectors  in  the  domain  U that  get 
“sent”  to  the  vector  v. 


Example  SPIAS  Sample  pre-images,  Archetype  S 
Archetype  S is  the  linear  transformation  defined  by 


T:  C3  ->  M22, 


T 


a — b 
3 a + b + c 


2 a T 2b  c 
—2a  — 6b  — 2c 


We  could  compute  a pre-image  for  every  element  of  the  codomain  Af22.  However, 
even  in  a free  textbook,  we  do  not  have  the  room  to  do  that,  so  we  will  compute 
just  two. 

Choose 


2 

3 


1 

2 


€ M22 


v = 
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for  no  particular  reason.  What  is  T 1 (v)?  Suppose  u 
condition  that  T (u)  = v becomes 


Ul 

u2 

«3_ 


€ T-1  (v).  The 


2 

3 


1 

2 


= v = T (u)  = T 


Ul  - U 2 

3zti  + u2  +u3 


2ui  + 2 u2  + u3 
—2ui  — 6u2  — 2u3 


Using  matrix  equality  (Definition  ME),  we  arrive  at  a system  of  four  equations 
in  the  three  unknowns  u3 , u2 , u3  with  an  augmented  matrix  that  we  can  row-reduce 
in  the  hunt  for  solutions, 


' 1 -1 

0 

21 

P 

0 

1 

4 

5 

4 

2 2 

1 

1 

RREF 
y 

0 

0 

1 

4 

3 

4 

3 1 

1 

3 

0 

0 

0 

0 

T 

to 

1 

—2 

2j 

0 

0 

0 

o _ 

We  recognize  this  system  as  having  infinitely  many  solutions  described  by  the 
single  free  variable  u3.  Eventually  obtaining  the  vector  form  of  the  solutions  (Theorem 
VFSLS),  we  can  describe  the  preimage  precisely  as, 

T-1  (v)  = {u£  C3|T(u)  = v} 


This  last  line  is  merely  a suggestive  way  of  describing  the  set  on  the  previous 
line.  You  might  create  three  or  four  vectors  in  the  preimage,  and  evaluate  T with 
each.  Was  the  result  what  you  expected?  For  a hint  of  things  to  come,  you  might 
try  evaluating  T with  just  the  lone  vector  in  the  spanning  set  above.  What  was  the 
result?  Now  take  a look  back  at  Theorem  PSPHS.  Hmmmm. 

OK,  let  us  compute  another  preimage,  but  with  a different  outcome  this  time. 
Choose 


1 

2 


1 

4 


G M22 


v = 
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What  is  T 1 (v)?  Suppose  u = 


u 1 
M2 
W.3. 


£ T 1 (v).  That  T (u)  = v becomes 


1 

2 


1 

4 


= v = T(u)  = T 


Ml  - M2 
3mi  + m2  + m3 


2mi  + 2m2  + m3 
— 2mi  — 6m2  — 2m3 


Using  matrix  equality  (Definition  ME),  we  arrive  at  a system  of  four  equations 
in  the  three  unknowns  Mi,  m2,  m3  with  an  augmented  matrix  that  we  can  row-reduce 
in  the  hunt  for  solutions, 


r i 

-1 

0 

11 

[H 

0 

1 

4 

0 ' 

2 

2 

1 

l 

RREF 

0 

s 

1 

4 

0 

3 

1 

1 

2 

0 

0 

0 

0 

L-2  —6 

-2  4J 

. 0 

0 

0 

0 J 

By  Theorem  RCLS  we  recognize  this  system  as  inconsistent.  So  no  vector  u is  a 
member  of  T-1  (v)  and  so 


T"1  (v)  = 0 


A 


The  preimage  is  just  a set,  it  is  almost  never  a subspace  of  U (you  might  think 
about  just  when  T”1  (v)  is  a subspace,  see  Exercise  ILT.T10).  We  will  describe  its 
properties  going  forward,  and  it  will  be  central  to  the  main  ideas  of  this  chapter. 


Subsection  NLTFO 

New  Linear  Transformations  From  Old 

We  can  combine  linear  transformations  in  natural  ways  to  create  new  linear  trans- 
formations. So  we  will  define  these  combinations  and  then  prove  that  the  results 
really  are  still  linear  transformations.  First  the  sum  of  two  linear  transformations. 

Definition  LTA  Linear  Transformation  Addition 

Suppose  that  T:  U — )•  V and  S:  U -A  V are  two  linear  transformations  with  the 
same  domain  and  codomain.  Then  their  sum  is  the  function  T + S : U — ► V whose 
outputs  are  defined  by 

(T  + S)  (u)  — T (u)  T S (u) 

□ 

Notice  that  the  first  plus  sign  in  the  definition  is  the  operation  being  defined, 
while  the  second  one  is  the  vector  addition  in  V . (Vector  addition  in  U will  appear 
just  now  in  the  proof  that  T+S  is  a linear  transformation.)  Definition  LTA  only 
provides  a function.  It  would  be  nice  to  know  that  when  the  constituents  (T,  S)  are 
linear  transformations,  then  so  too  is  T + S. 
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Theorem  SLTLT  Sum  of  Linear  Transformations  is  a Linear  Transformation 
Suppose  that  T:  U — » V and  S : U — >•  V are  two  linear  transformations  with  the 
same  domain  and  codomain.  Then  T + S : U — ► V is  a linear  transformation. 


Proof.  We  simply  check  the  defining  properties  of  a linear  transformation  (Definition 
LT).  This  is  a good  place  to  consistently  ask  yourself  which  objects  are  being 
combined  with  which  operations. 


(T  + S)  (x  + y)=T(x  + y)+S(x  + y) 

= T(x)  + T(y)  + S(x)  + S(y) 
= T(x)  + 5(x)+T(y)  + 5(y) 
= (T  + S)  (x)  + (T  + S)  (y) 


Definition  LTA 
Definition  LT 
Property  C in  V 
Definition  LTA 


(T  + S)  (ax)  = T (ax)  + S (ax) 
= aT  (x)  + aS  (x) 
= a (T  (x)  + S (x)) 
= a(T  + S)  (x) 


Definition  LTA 
Definition  LT 
Property  DVA  in  V 
Definition  LTA 


Example  STLT  Sum  of  two  linear  transformations 
Suppose  that  T : C2  — » C3  and  S : C2  — » C3  are  defined  by 


T 


' X\  + 2x2  ' 

3xi  — 4x2 
,5xi  + 2x’2. 


Then  by  Definition  LTA,  we  have 


S 


4xi  - x2 
xi  + 3x2 
— 7xi + 5x 


2. 


(T  + S) 


" xi  + 2x2  ' 

4xi  — x2 

5xi  + x2 

3xi  — 4x2 

+ 

Xi  + 3x2 

= 

4xi  - x2 

_5xi  + 2x2. 

7xi  + 5x2. 

. — 2xi  + 7x2. 

and  by  Theorem  SLTLT  we  know  T + S is  also  a linear  transformation  from  C2  to 
C3.  A 


Definition  LTSM  Linear  Transformation  Scalar  Multiplication 

Suppose  that  T:  U — > V is  a linear  transformation  and  a G C . Then  the  scalar 

multiple  is  the  function  aT  :[/—>■  V whose  outputs  are  defined  by 

(aT)  (u)  = aT  (u) 
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□ 

Given  that  T is  a linear  transformation,  it  would  be  nice  to  know  that  aT  is  also 
a linear  transformation. 

Theorem  MLTLT  Multiple  of  a Linear  Transformation  is  a Linear  Transformation 
Suppose  that  T : U — > V is  a linear  transformation  and  a £ C.  Then  ( aT ) : U — > V 
is  a linear  transformation. 

Proof.  We  simply  check  the  defining  properties  of  a linear  transformation  (Definition 
LT).  This  is  another  good  place  to  consistently  ask  yourself  which  objects  are  being 
combined  with  which  operations. 


(aT)  (x  A y)  = a (T  (x  A y)) 

Definition  LTSM 

= a (T  (x)  A T (y)) 

Definition  LT 

= aT  (x)  A aT  (y) 

Property  DVA  in  V 

= (cdT)  (x)  + (cdT)  (y) 

Definition  LTSM 

(aT)  (fix)  = aT  (j3x) 

Definition  LTSM 

= a (f3T  (x)) 

Definition  LT 

= (a/3)T(x) 

Property  SMA  in  V 

= (13a)  T (x) 

Commutativity  in  C 

= /3  (aT(x)) 

Property  SMA  in  V 

= 13  ((aT)  (x)) 

Definition  LTSM 

Example  SMLT  Scalar  multiple  of  a linear  transformation 
Suppose  that  T : C4  — » C3  is  defined  by 


T 


/ 

~x{ 

\ 

Xi 

X3 

V 

Xi 

X\  A 2x2  — X3  A 2x4 
X\  A 5x2  — 3x3  A X4 
2xi  A 3x2  — 4x3  + 2x4. 


For  the  sake  of  an  example,  choose  a = 2,  so  by  Definition  LTSM,  we  have 


/ 

~x{ 

\ 

Xi 

X3 

V 

Xi 

/ 

/ 

X\ 

Xi 

X3 

V 

Xi- 

= 2 


Xi  A 2x2  -X3  A 2x4 
Xi  A 5x2  — 3x3  + Xi 
— 2xi  A 3x2  — 4x’3  A 2x4. 
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' 2xi  + 4x2  - 
A 10x2 
— 4xi  + 6x2 


2X3  + 4X4 
6X3  + 2x4 
- 8X3  + 4X4. 


and  by  Theorem  MLTLT  we  know  2 T is  also  a linear  transformation  from  C4  to  C3. 

A 


Now,  let  us  imagine  we  have  two  vector  spaces,  U and  V,  and  we  collect  every 
possible  linear  transformation  from  U to  V into  one  big  set,  and  call  it  CT  ( U , V). 
Definition  LTA  and  Definition  LTSM  tell  us  how  we  can  “add”  and  “scalar  multiply” 
two  elements  of  CT  (U,  V").  Theorem  SLTLT  and  Theorem  MLTLT  tell  us  that  if 
we  do  these  operations,  then  the  resulting  functions  are  linear  transformations  that 
are  also  in  CT  (U,  V).  Hmmmm,  sounds  like  a vector  space  to  me!  A set  of  objects, 
an  addition  and  a scalar  multiplication.  Why  not? 

Theorem  VSLT  Vector  Space  of  Linear  Transformations 

Suppose  that  U and  V are  vector  spaces.  Then  the  set  of  all  linear  ti'ansformations 
from  U to  V,  CT  (U,  V),  is  a vector  space  when  the  operations  are  those  given  in 
Definition  LTA  and  Definition  LTSM. 


Proof.  Theorem  SLTLT  and  Theorem  MLTLT  provide  two  of  the  ten  properties  in 
Definition  VS.  However,  we  still  need  to  verify  the  remaining  eight  properties.  By 
and  large,  the  proofs  are  straightforward  and  rely  on  concocting  the  obvious  object, 
or  by  reducing  the  question  to  the  same  vector  space  property  in  the  vector  space  V. 

The  zero  vector  is  of  some  interest,  though.  What  linear  transformation  would 
we  add  to  any  other  linear  transformation,  so  as  to  keep  the  second  one  unchanged? 
The  answer  is  Z : U — > V defined  by  Z (u)  = Oy  for  every  u e U . Notice  how  we  do 
not  need  to  know  any  of  the  specifics  about  U and  V to  make  this  definition  of  Z M 


Definition  LTC  Linear  Transformation  Composition 

Suppose  that  T:  U — ► V and  S : V — > W are  linear  transformations.  Then  the 
composition  of  S and  T is  the  function  (SoT):  U — >■  W whose  outputs  are  defined 
by 

(SoT)  (u)  = S (T  (u)) 

□ 

Given  that  T and  S are  linear  transformations,  it  would  be  nice  to  know  that 
S'  o T is  also  a linear  transformation. 


Theorem  CLTLT  Composition  of  Linear  Transformations  is  a Linear  Transforma- 
tion 

Suppose  that  T:  U — ► V and  S : V — > W are  linear  transformations.  Then  (S  o 
T) : U — > W is  a linear  transformation. 
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Proof.  We  simply  check  the  defining  properties  of  a linear  transformation  (Definition 
LT). 


(SoT)  (x  + y) 


and 


(S  o T)  (ax) 


S (T  (x  + y)) 

Definition  LTC 

S(T(x)+T(y)) 

Definition  LT  for 

S(T(x))  + S(T(  y)) 

Definition  LT  for 

(SoT)  (x)  + (S oT)  (y) 

Definition  LTC 

S(T(ax)) 

Definition  LTC 

S (aT  (x)) 

Definition  LT  for 

aS(T(x)) 

Definition  LT  for 

a(S  0 T)  (x) 

Definition  LTC 

T 

S 


T 

S 


Example  CTLT  Composition  of  two  linear  transformations 
Suppose  that  T : C2  — » C4  and  S : C4  — > C3  are  defined  by 


T 


' X 1 + 2x2 
3xi  — 4x2 
5xi  + 2x2 

6xi  — 3x2 


Then  by  Definition  LTC 


2xi  — X2  + X3  — X4 

5xi  — 3x2  + 8x3  — 2x4 
— 4xi  + 3X2  ^ 4X3  + 5X4. 


(SoT) 


= S T 


= S 


' XI  + 2x2  ' 
3xt  — 4x2 
5xi  + 2x2 
6x1  — 3X2. 


2(xi  + 2x2)  — (3xi  — 4x2)  + (5xr  + 2x2)  — (6x1  — 3x2) 
5(xi  + 2x2)  — 3(3xi  — 4x2)  + 8(5xi  + 2x2)  — 2(6xi  — 3x2) 
— 4(xi  + 2x2)  + 3(3xi  — 4x2)  — 4(5xi  + 2x2)  + 5(6xi  — 3x2). 

— 2xi  + 13x2' 

24xi  + 44x2 
15xi  — 43x2. 


and  by  Theorem  CLTLT  S o T is  a linear  transformation  from  C2  to  C3.  A 


Here  is  an  interesting  exercise  that  will  presage  an  important  result  later.  In 
Example  STLT  compute  (via  Theorem  MLTCV)  the  matrix  of  T,  S and  T + S.  Do 
you  see  a relationship  between  these  three  matrices? 

In  Example  SMLT  compute  (via  Theorem  MLTCV)  the  matrix  of  T and  2 T.  Do 
you  see  a relationship  between  these  two  matrices? 

Here  is  the  tough  one.  In  Example  CTLT  compute  (via  Theorem  MLTCV)  the 
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matrix  of  T,  S and  S o T.  Do  you  see  a relationship  between  these  three  matrices??? 

Reading  Questions 


1.  Is  the  function  below  a linear  transformation?  Why  or  why  not? 


T:  C3  — MC2,  T 


Xi 

\ r 

X2 

= 

\ 

X3 

/ L 

3x i - x2  + x3 
8x2  — 6 


2.  Determine  the  matrix  representation  of  the  linear  transformation  S below. 


S:C2^C3,  s 


3xi  + 5x2 
8x1  — 3X2 
— 4xi 


3.  Theorem  LTLC  has  a fairly  simple  proof.  Yet  the  result  itself  is  very  powerful.  Comment 
on  why  we  might  say  this. 


Exercises 

C15  The  archetypes  below  are  all  linear  transformations  whose  domains  and  codomains 
are  vector  spaces  of  column  vectors  (Definition  VSCV).  For  each  one,  compute  the  matrix 
representation  described  in  the  proof  of  Theorem  MLTCV. 


Archetype  M,  Archetype  N,  Archetype  O,  Archetype  P,  Archetype  Q,  Archetype  R 

"3x  + 2 y + z 

016^  Find  the  matrix  representation  of  T : C3  — > C4,  T [ y ) = X + ^3^ 


X 

\ 

y 

= 

\ 

z 

) 

C20'  Let  w = 


-3 

1 

4 


. Referring  to  Example  MOLT,  compute  S (w)  two  different  ways. 


First  use  the  definition  of  S,  then  compute  the  matrix- vector  product  Cw  (Definition 
MVP). 

C25^  Define  the  linear  transformation 


T:  C3— >C2,  T 


Xi 

\ r 

X2 

= 

\ 

x3 

) 1 

2xi  — x2  + 5x3 
— 4xi  + 2x2  — IOX3 


Verify  that  T is  a linear  transformation. 

C26^  Verify  that  the  function  below  is  a linear  transformation. 


T : P2  — > C2,  T [a  + bx  + cx 2) 


2a  — b 
b + c 
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C3CL  Define  the  linear  transformation 


T:C3— >C2,  T 


*1 

\ r 

*2 

= 

\ 

*3 

) 1 

2*1  — X2  + 5*3 
—4*i  + 2*2  — 10*3 


Compute  the  preimages,  T 


and  T 1 


C31'  For  the  linear  transformation  S compute  the  pre-images. 


( 

a 

\ 

a — 26  — c 

S :C3— HC3,  S’ 

b 

= 

3a  — b + 2c 

V 

c 

/ 

a + b + 2c 

-2 

5 

3 


C40f  If  T:  C2  -A  C2  satisfies  T 

C41f  If  T:  C2  -a  C3  satisfies  T 
representation  of  T. 

C42f  Define  T:  M22  -4  C1  by  T 


and  T 


find  T 


~2 

'-1' 

( 

2 

3 

)- 

2 

and  T ( 

'3 

4 

)- 

0 

J 

1 

\ 

J 

2 

a b 
c d 


, find  the  matrix 


= a + b + c — d.  Find  the  pre-image  T 1 (3). 


C43^  Define  T : P3  — > P2  by  T (a  + bx  + c*2  + d*3)  = b + 2c*  + 3d*2 . Find  the  pre-image 
of  0.  Does  this  linear  transformation  seem  familiar? 

M10'  Define  two  linear  transformations,  T : C4  — ► C3  and  S : C3  — > C2  by 


( 

*1 

\ r 

*2 

= 

V 

*3 

/ L 

*1  — 2*2  + 3*3 
5*i  + 4*2  + 2*3 


( 

'*1' 

\ r 

T 

*2 

= 

*3 

\ 

_*4_ 

J L 

— *1  + 3*2  + *3  + 9*4 
2*i  + *3  + 7*4 
4*i  + 2*2  + *3  + 2*4 


Using  the  proof  of  Theorem  MLTCV  compute  the  matrix  representations  of  the  three  linear 
transformations  T,  S and  S o T.  Discover  and  comment  on  the  relationship  between  these 
three  matrices. 


M60  Suppose  U and  V are  vector  spaces  and  define  a function  Z : U — > V by  Z (u)  = 0 y 
for  every  u £ U.  Prove  that  Z is  a (stupid)  linear  transformation.  (See  Exercise  ILT.M60, 
Exercise  SLT.M60,  Exercise  IVLT.M60.) 

T20  Use  the  conclusion  of  Theorem  LTLC  to  motivate  a new  definition  of  a linear 
transformation.  Then  prove  that  your  new  definition  is  equivalent  to  Definition  LT.  (Proof 
Technique  D and  Proof  Technique  E might  be  helpful  if  you  are  not  sure  what  you  are 
being  asked  to  prove  here.) 


Theorem  SER  established  three  properties  of  matrix  similarity  that  are  collectively  known 
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as  the  defining  properties  of  an  “equivalence  relation”.  Exercises  T30  and  T31  extend  this 
idea  to  linear  transformations. 

T30  Suppose  that  T : U — > V is  a linear  transformation.  Say  that  two  vectors  from  U, 
x and  y,  are  related  exactly  when  T (x)  = T (y)  in  V.  Prove  the  three  properties  of 
an  equivalence  relation  on  U : (a)  for  any  x £ U,  x is  related  to  x,  (b)  if  x is  related  to 
y,  then  y is  related  to  x,  and  (c)  if  x is  related  to  y and  y is  related  to  z,  then  x is 
related  to  z. 

T314  Equivalence  relations  always  create  a partition  of  the  set  they  are  defined  on,  via 
a construction  called  equivalence  classes.  For  the  relation  in  the  previous  problem,  the 
equivalence  classes  are  the  pre-images.  Prove  directly  that  the  collection  of  pre-images 
partition  U by  showing  that  (a)  every  x £ U is  contained  in  some  pre-image,  and  that 
(b)  any  two  different  pre-images  do  not  have  any  elements  in  common. 


Section  ILT 

Injective  Linear  Transformations 

Some  linear  transformations  possess  one,  or  both,  of  two  key  properties,  which  go 
by  the  names  injective  and  surjective.  We  will  see  that  they  are  closely  related  to 
ideas  like  linear  independence  and  spanning,  and  subspaces  like  the  null  space  and 
the  column  space.  In  this  section  we  will  define  an  injective  linear  transformation 
and  analyze  the  resulting  consequences.  The  next  section  will  do  the  same  for  the 
surjective  property.  In  the  final  section  of  this  chapter  we  will  see  what  happens 
when  we  have  the  two  properties  simultaneously. 

Subsection  ILT 

Injective  Linear  Transformations 

As  usual,  we  lead  with  a definition. 

Definition  ILT  Injective  Linear  Transformation 

Suppose  T:  U — > V is  a linear  transformation.  Then  T is  injective  if  whenever 
T (x)  = T (y),  then  x = y.  □ 

Given  an  arbitrary  function,  it  is  possible  for  two  different  inputs  to  yield  the  same 
output  (think  about  the  function  f(x)  = x2  and  the  inputs  x = 3 and  x = —3).  For  an 
injective  function,  this  never  happens.  If  we  have  equal  outputs  (T  (x)  = T (y))  then 
we  must  have  achieved  those  equal  outputs  by  employing  equal  inputs  (x  = y).  Some 
authors  prefer  the  term  one-to-one  where  we  use  injective,  and  we  will  sometimes 
refer  to  an  injective  linear  transformation  as  an  injection. 

Subsection  EILT 

Examples  of  Injective  Linear  Transformations 

It  is  perhaps  most  instructive  to  examine  a linear  transformation  that  is  not  injective 
first. 

Example  NIAQ  Not  injective,  Archetype  Q 
Archetype  Q is  the  linear  transformation 


( 

~x{ 

) 

— 2xi  + 3X2  + 3X3  ~ 6x4  + 3x5 

x2 

— 16xi  + 9x2  + 12x3  — 28x4  + 28x5 

x3 

= 

— 19xi  T 7x2  + 14x3  — 32X4  + 3TX5 

X4 

— 21xi  + 9x2  + 15x3  ~ 35x4  + 39x5 

V 

x5_ 

— 9xi  + 5x2  + 7x3  — 16x4  + I6.X5 

446 
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Notice  that  for 


' 1 ■ 

'4' 

3 

7 

-1 

y = 

0 

2 

5 

4 

7 

we  have 


' 1 ' 

'4' 

( 

'4' 

'4' 

3 

55 

7 

55 

-1 

= 

72 

T 

0 

= 

72 

2 

77 

5 

77 

4 

) 

31 

\ 

7 

) 

31 

So  we  have  two  vectors  from  the  domain,  x/y,  yet  T (x)  = T (y),  in  violation  of 
Definition  ILT.  This  is  another  example  where  you  should  not  concern  yourself  with 
how  x and  y were  selected,  as  this  will  be  explained  shortly.  However,  do  understand 
why  these  two  vectors  provide  enough  evidence  to  conclude  that  T is  not  injective.  A 


Here  is  a cartoon  of  a non-injective  linear  transformation.  Notice  that  the  central 
feature  of  this  cartoon  is  that  T (u)  = v = T (w).  Even  though  this  happens  again 
with  some  unnamed  vectors,  it  only  takes  one  occurrence  to  destroy  the  possibility 
of  injectivity.  Note  also  that  the  two  vectors  displayed  in  the  bottom  of  V have  no 
bearing,  either  way,  on  the  injectivity  of  T. 
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Diagram  NILT:  Non-Injective  Linear  Transformation 


To  show  that  a linear  transformation  is  not  injective,  it  is  enough  to  find  a single 
pair  of  inputs  that  get  sent  to  the  identical  output,  as  in  Example  NIAQ.  However,  to 
show  that  a linear  transformation  is  injective  we  must  establish  that  this  coincidence 
of  outputs  never  occurs.  Here  is  an  example  that  shows  how  to  establish  this. 

Example  IAR  Injective,  Archetype  R 
Archetype  R is  the  linear  transformation 


T:  C5 


C5 


T 


f 

~x{ 

— 65xi  + 128x2  + IOX3  — 262x4  + 40x5 

22 

36xi  — 73x2  — 23  + 151x4  — 16x5 

23 

= 

— 44xi  + 88x2  + 5x3  — I8OX4  + 24x5 

24 

34xi  — 68x2  — 3x3  + 140x4  — 18x5 

V 

25 

) 

12xi  — 24x’2  — 23  + 49x4  — 5x5 

To  establish  that  R is  injective  we  must  begin  with  the  assumption  that  T (x) 
T (y)  and  somehow  arrive  at  the  conclusion  that  x = y.  Here  we  go, 


= T(x)-T(y) 


( 

~x{ 

( 

2/1" 

\ 

22 

2/2 

T 

23 

-T 

2/3 

24 

2/4 

\ 

25. 

) 

\ 

.2/5. 

/ 

— 65a;  i + 128x2  + IO23  — 262x4  + 4O25' 

36xi  — 73x2  — X3  + 151x4  — 16x5 
— 44xi  + 88x2  + 5x3  — I8OX4  + 24x5 
34xi  — 68x2  — 3x3  + 140x4  — I8X5 
12xi  — 24x2  — X3  + 49x4  — 5x’5 

-65yi  + 128j/2  + IO2/3  - 262y4  + 40y5' 

36yi  - 73?/2  - 2/3  + 1512/4  - I63/5 
-442/i  + 88j/2  + 5 1/3  - I8O2/4  + 242/5 
34y-|  - 682/2  - 3//3  + 140//4  - I82/5 
I2yi  - 24y2  - y3  + 492/4  - 5y5 

-65(xi  - 2/1)  + 128(x2  - 2/2)  + 10(23  - 2/3)  - 262(x4  - 2/4)  + 40(x5  - y5J 
36(xi  - 2/1)  - 73(x2  - 2/2)  - (23  - 1/3)  + 151(x4  - 2/4)  - 16(25  - 2/5 ) 
-44(2i  - 2/1 ) + 88(22  - 2/2)  + 5(x3  - 2/3)  - 180(x4  - 1/4)  + 24(x5  - 2/5) 
34(2i  - 2/1)  - 68(x2  - 2/2)  - 3(x3  - 2/3)  + 140(x4  - 2/4)  - 18(x5  - 2/5) 
12(xi  - 2/1)  - 24(x2  - 2/2)  - (23  - 2/3)  + 49(x4  - 2/4)  - 5(x5  - 2/5) 
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"—65 

128 

10 

-262 

40  ' 

X\ 

- yi 

36 

-73 

-1 

151 

-16 

X2 

- V2 

-44 

88 

5 

-180 

24 

x3 

- 2/3 

34 

-68 

-3 

140 

-18 

X4 

- 2/4 

12 

-24 

-1 

49 

-5 

_X5 

- 2/5. 

Now  we  recognize  that  we  have  a homogeneous  system  of  5 equations  in  5 variables 
(the  terms  xt  — y \ are  the  variables),  so  we  row-reduce  the  coefficient  matrix  to 

"0  0 0 0 0 " 

0 0 0 0 0 

0 0 0 0 0 

0 0 0 0 0 

_ 0 0 0 0 0 

So  the  only  solution  is  the  trivial  solution 

X\  - 2/i  = 0 x2  - 2/2  = 0 x3  - y3  = 0 x4  - yA  = 0 x5  - y5  = 0 
and  we  conclude  that  indeed  x = y.  By  Definition  ILT,  T is  injective.  A 

Here  is  the  cartoon  for  an  injective  linear  transformation.  It  is  meant  to  suggest 
that  we  never  have  two  inputs  associated  with  a single  output.  Again,  the  two  lonely 
vectors  at  the  bottom  of  V have  no  bearing  either  way  on  the  injectivity  of  T. 


Diagram  ILT:  Injective  Linear  Transformation 
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Let  us  now  examine  an  injective  linear  transformation  between  abstract  vector 
spaces. 

Example  IAV  Injective,  Archetype  V 
Archetype  V is  defined  by 


T : P, 


M22,  T (a  + bx  + cx2  + dx3)  = 


a + b a — 2c 
d b-d 


To  establish  that  the  linear  transformation  is  injective,  begin  by  supposing  that 
two  polynomial  inputs  yield  the  same  output  matrix, 

T (ui  A b{X  A c3x2  A dice3)  = T {0*2  A b2X  A C2X2  A d2x3^j 

Then 


O = 


0 0 
0 0 


= T (ai  A b\X  A C\X2  A diX3)  — T (a2  A b2x  A c2x2  A d2x3)  Hypothesis 
= T ((ai  A b\X  A c\x2  A dia;3)  — ( a2  A b2x  A c2x2  A d2x3))  Definition  LT 

= T ((ai  — a2)  A ( bi  — b2)x  A (ci  — c2)x2  A (di  — d2)x 3)  Operations  in  P3 

= ’(fll  - ^ + ( b\  b 2)  ft  - ah2\  ~ 2(ci  - P2)l  Definition  of  T 

(«i  — d2)  (bi  - b2)  ~ [di  - d2)  _ 

This  single  matrix  equality  translates  to  the  homogeneous  system  of  equations 
in  the  variables  ai  — bi, 

(ai  - a2)  A (61  - 62)  = 0 
(01  - a2)  - 2(ci  - c2)  = 0 
(di  — d2)  = 0 
(&i  - 62)  - (di  - d2)  = 0 

This  system  of  equations  can  be  rewritten  as  the  matrix  equation 


1 1 0 

1 0 -2 

0 0 0 

0 1 0 


' 

'(ai  - a2)' 

'0" 

(bi  ~ b2) 

0 

(ci  - c2) 

0 

(di  — d2)_ 

0 

Since  the  coefficient  matrix  is  nonsingular  (check  this)  the  only  solution  is  trivial, 

i.e. 

ai  — a2  = 0 bi  — b2  = 0 Ci  — c2  = 0 d\  — d2  = 0 

so  that 

ai  = a2  bi  = b2  Ci  = c2  d\  = d2 

so  the  two  inputs  must  be  equal  polynomials.  By  Definition  ILT,  T is  injective.  A 
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Subsection  KLT 

Kernel  of  a Linear  Transformation 


For  a linear  transformation  T : U — > V,  the  kernel  is  a subset  of  the  domain  U . 
Informally,  it  is  the  set  of  all  inputs  that  the  transformation  sends  to  the  zero  vector 
of  the  codomain.  It  will  have  some  natural  connections  with  the  null  space  of  a 
matrix,  so  we  will  keep  the  same  notation,  and  if  you  think  about  your  objects,  then 
there  should  be  little  confusion.  Here  is  the  careful  definition. 


Definition  KLT  Kernel  of  a Linear  Transformation 

Suppose  T : U — > V is  a linear  transformation.  Then  the  kernel  of  T is  the  set 

1C(T)  = {ug17|T(u)  = 0} 


□ 


Notice  that  the  kernel  of  T is  just  the  preimage  of  0,  T 1 (0)  (Definition  PI). 
Here  is  an  example. 


Example  NKAO  Nontrivial  kernel,  Archetype  O 
Archetype  O is  the  linear  transformation 


T:  C3  -A  C5, 


— Xi  A X2  - 3x3  ' 
—x\  A 2x2  - 4x3 
Xl  A X2  A X3 
2x\  A 8x2  4“  x3 
Xi  + 2x3 


To  determine  the  elements  of  C3  in  /C(T),  find  those  vectors  u such  that  T (u)  = 0, 
that  is, 


T(  u)=0 


" —Ml  + m2  - 3m3  ’ 

"0" 

-Mi  + 2m2  — 4m3 

0 

Ml  + M2  + M3 

= 

0 

2mi  A 3m2  A m3 

0 

mi  A 2 m3 

0 

Vector  equality  (Definition  CVE)  leads  us  to  a homogeneous  system  of  5 equations 
in  the  variables  tq, 


— Ml  + U2  - 3w3  = 0 
— Hi  + 2u2  — 4m3  = 0 
Ml  + M2  + M3  = 0 
2mi  + 3m2  + m3  = 0 
Mi  + 2m3  = 0 
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Row-reducing  the  coefficient  matrix  gives 

'0  0 2' 

0 0 -1 
0 0 0 

0 0 0 

0 0 0 

The  kernel  of  T is  the  set  of  solutions  to  this  homogeneous  system  of  equations, 
which  by  Theorem  BNS  can  be  expressed  as 

JC{T)  = ( <|  [ l21 


A 


We  know  that  the  span  of  a set  of  vectors  is  always  a subspace  (Theorem  SSS), 
so  the  kernel  computed  in  Example  NKAO  is  also  a subspace.  This  is  no  accident, 
the  kernel  of  a linear  transformation  is  always  a subspace. 

Theorem  KLTS  Kernel  of  a Linear  Transformation  is  a Subspace 

Suppose  that  T:  U -+  V is  a linear  transformation.  Then  the  kernel  ofT,  1C(T),  is 

a subspace  of  U . 


Proof.  We  can  apply  the  three-part  test  of  Theorem  TSS.  First  T (Oy)  = Oy  by 
Theorem  LTTZZ,  so  Ojj  £ K.(T)  and  we  know  that  the  kernel  is  nonempty. 

Suppose  we  assume  that  x,  y £ K.(T).  Is  x + y £ /C(T)? 

T (x  + y)  = T (x)  + T (y)  Definition  LT 

= 0 + 0 x,  y £ JC(T) 

= 0 Property  Z 

This  qualifies  x + y for  membership  in  /C(T).  So  we  have  additive  closure. 
Suppose  we  assume  that  a £ C and  x £ K.(T).  Is  ax  £ /C(T)? 

T (ax)  = aT  (x)  Definition  LT 

= aO  x £ /C(T) 

= 0 Theorem  ZVSM 

This  qualifies  ax  for  membership  in  K{T).  So  we  have  scalar  closure  and  Theorem 
TSS  tells  us  that  K,(T)  is  a subspace  of  U . ■ 


Let  us  compute  another  kernel,  now  that  we  know  in  advance  that  it  will  be  a 
subspace. 

Example  TKAP  Trivial  kernel,  Archetype  P 
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Archetype  P is  the  linear  transformation 


T:  C3 


C5 


T 


- Xl  + X2  + x3 

Xl 

\ 

- X\  + 2x2  + 2,x3 

X2 

= 

X\  + x2  + 3x3 

X3_ 

J 

2xi  + 3x2  + x3 

2xi  + x-2  + 3x3 

To  determine  the  elements  of  C3  in  /C(T),  find  those  vectors  u such  that  T (u)  = 0. 
that  is, 


T (u)  = 0 


-Ml  +U2+U3 

'O' 

Ui  + 2 u2  + 2 u3 

0 

Ml  + m2  + 3m3 

= 

0 

2mi  + 3m2  + m3 

0 

2mi  + m2  + 3m3_ 

0 

Vector  equality  (Definition  CVE)  leads  us  to  a homogeneous  system  of  5 equations 
in  the  variables  iq, 


—Ml  + M2  + u3  = 0 
— ii\  -f-  2m2  T 2m3  = 0 
Ui  + u2  + 3 m3  = 0 
2mi  + 3m2  + u3  = 0 
— 2mi  + u2  + 3m3  = 0 
Row-reducing  the  coefficient  matrix  gives 

S 0 O' 

0 0 0 

0 0 0 

0 0 0 

. 0 0 0 _ 

The  kernel  of  T is  the  set  of  solutions  to  this  homogeneous  system  of  equations, 
which  is  simply  the  trivial  solution  u = 0,  so 

V(T)  = {0}  = ({  }) 

A 


Our  next  theorem  says  that  if  a preimage  is  a nonempty  set  then  we  can  construct 
it  by  picking  any  one  element  and  adding  on  elements  of  the  kernel. 

Theorem  KPI  Kernel  and  Pre-Image 

Suppose  T:  U — ► V is  a linear  transformation  and  v £ V.  If  the  preimage  T^1  (v) 
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is  nonempty,  and  u GT  1 (v)  then 

T~l  (v)  = { u + z|  z e K(T)}  = u + £(T) 

Proof.  Let  M = { u + z|  z G KL(T)}.  First,  we  show  that  M C T-1  (v).  Suppose  that 
w G M,  so  w has  the  form  w = u + z,  where  z G KL(T).  Then 

T (w)  = T (u  + z) 

= T (u)  + T (z)  Definition  LT 

= v + 0 u G T-1  (v)  , z G K(T) 

= v Property  Z 

which  qualifies  w for  membership  in  the  preimage  of  v,  w G T _1  (v). 

For  the  opposite  inclusion,  suppose  x G T (v).  Then, 

T (x  — u)  = T (x)  — T (u)  Definition  LT 

= v — v x,  u € T-1  (v) 

= 0 

This  qualifies  x u for  membership  in  the  kernel  of  T,  JC(T).  So  there  is  a vector 
z G KL(T)  such  that  x u = z.  Rearranging  this  equation  gives  x = u + z and  so 
x G M.  So  T-1  (v)  C M and  we  see  that  M = T-1  (v),  as  desired.  ■ 

This  theorem,  and  its  proof,  should  remind  you  very  much  of  Theorem  PSPHS. 
Additionally,  you  might  go  back  and  review  Example  SPIAS.  Can  you  tell  now  which 
is  the  only  preimage  to  be  a subspace? 

Here  is  the  cartoon  which  describes  the  “many-to-one”  behavior  of  a typical 
linear  transformation.  Presume  that  T (uj)  = v^,  for  z = 1,2,  3,  and  as  guaranteed 
by  Theorem  LTTZZ,  T (Oy)  = Oy.  Then  four  pre-images  are  depicted,  each  labeled 
slightly  different.  T-1  (V2)  is  the  most  general,  employing  Theorem  KPI  to  provide 
two  equal  descriptions  of  the  set.  The  most  unusual  is  T-1  (Oy)  which  is  equal  to  the 
kernel,  /C(T),  and  hence  is  a subspace  (by  Theorem  KLTS).  The  subdivisions  of  the 
domain,  U,  are  meant  to  suggest  the  partioning  of  the  domain  by  the  collection  of 
pre-images.  It  also  suggests  that  each  pre-image  is  of  similar  size  or  structure,  since 
each  is  a “shifted”  copy  of  the  kernel.  Notice  that  we  cannot  speak  of  the  dimension 
of  a pre-image,  since  it  is  almost  never  a subspace.  Also  notice  that  x,  y £ V are 
elements  of  the  codomain  with  empty  pre-images. 
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Diagram  KPI:  Kernel  and  Pre-Image 

The  next  theorem  is  one  we  will  cite  frequently,  as  it  characterizes  injections  by 
the  size  of  the  kernel. 

Theorem  KILT  Kernel  of  an  Injective  Linear  Transformation 

Suppose  that  T:  U — ► V is  a linear  transformation.  Then  T is  injective  if  and  only 

if  the  kernel  ofT  is  trivial,  K.(T)  = {0}. 

Proof.  (=>)  We  assume  T is  injective  and  we  need  to  establish  that  two  sets  are 
equal  (Definition  SE).  Since  the  kernel  is  a subspace  (Theorem  KLTS),  {0}  C K(T). 
To  establish  the  opposite  inclusion,  suppose  x € K(T). 

T (x)  = 0 Definition  KLT 

= T (0)  Theorem  LTTZZ 

We  can  apply  Definition  ILT  to  conclude  that  x = 0.  Therefore  /C(T)  C {0}  and 
by  Definition  SE,  /C(T)  = {0} . 

(<=)  To  establish  that  T is  injective,  appeal  to  Definition  ILT  and  begin  with 
the  assumption  that  T (x)  = T (y).  Then 

T (x  — y)  = T (x)  — T (y)  Definition  LT 

= 0 Hypothesis 

So  x — y £ KL(T)  by  Definition  KLT  and  with  the  hypothesis  that  the  kernel  is 
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trivial  we  conclude  that  x y = 0.  Then 

y = y + 0 = y + (x-y)=x 

thus  establishing  that  T is  injective  by  Definition  ILT.  ■ 


You  might  begin  to  think  about  how  Diagram  KPI  would  change  if  the  linear 
transformation  is  injective,  which  would  make  the  kernel  trivial  by  Theorem  KILT. 


Example  NIAQR  Not  injective,  Archetype  Q,  revisited 

We  are  now  in  a position  to  revisit  our  first  example  in  this  section,  Example  NIAQ. 
In  that  example,  we  showed  that  Archetype  Q is  not  injective  by  constructing  two 
vectors,  which  when  used  to  evaluate  the  linear  transformation  provided  the  same 
output,  thus  violating  Definition  ILT.  Just  where  did  those  two  vectors  come  from? 
The  key  is  the  vector 


'3' 

4 

1 

3 

3 


which  you  can  check  is  an  element  of  KL{T)  for  Archetype  Q.  Choose  a vector  x at 
random,  and  then  compute  y = x + z (verify  this  computation  back  in  Example 
NIAQ).  Then 


T(y)=T(x  + z) 

= T(x)+T(z) 
= T (x)  + 0 
= T(x) 


Definition  LT 
z e K.(T) 
Property  Z 


Whenever  the  kernel  of  a linear  transformation  is  nontrivial,  we  can  employ  this 
device  and  conclude  that  the  linear  transformation  is  not  injective.  This  is  another 
way  of  viewing  Theorem  KILT.  For  an  injective  linear  transformation,  the  kernel  is 
trivial  and  our  only  choice  for  z is  the  zero  vector,  which  will  not  help  us  create  two 
different  inputs  for  T that  yield  identical  outputs.  For  every  one  of  the  archetypes 
that  is  not  injective,  there  is  an  example  presented  of  exactly  this  form.  A 


Example  NIAO  Not  injective,  Archetype  O 

In  Example  NKAO  the  kernel  of  Archetype  O was  determined  to  be 


a subspace  of  C3  with  dimension  1.  Since  the  kernel  is  not  trivial,  Theorem  KILT 
tells  us  that  T is  not  injective.  A 


Example  IAP  Injective,  Archetype  P 
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In  Example  TKAP  it  was  shown  that  the  linear  transformation  in  Archetype  P has 
a trivial  kernel.  So  by  Theorem  KILT,  T is  injective.  A 


Subsection  ILTLI 

Injective  Linear  Transformations  and  Linear  Independence 

There  is  a connection  between  injective  linear  transformations  and  linearly  inde- 
pendent sets  that  we  will  make  precise  in  the  next  two  theorems.  However,  more 
informally,  we  can  get  a feel  for  this  connection  when  we  think  about  how  each 
property  is  defined.  A set  of  vectors  is  linearly  independent  if  the  only  relation  of 
linear  dependence  is  the  trivial  one.  A linear  transformation  is  injective  if  the  only 
way  two  input  vectors  can  produce  the  same  output  is  in  the  trivial  way,  when  both 
input  vectors  are  equal. 

Theorem  ILTLI  Injective  Linear  Transformations  and  Linear  Independence 
i Suppose  that  T:  U —>  V is  an  injective  linear  transformation  and 

S = {ui,  u2,  u3 , . . . , ut} 
is  a linearly  independent  subset  of  U . Then 

R = {T  (Ul) , T (112) , T (u3) , . . . , T (ut)} 

is  a linearly  independent  subset  ofV. 


Proof.  Begin  with  a relation  of  linear  dependence  on  R (Definition  RLD,  Definition 
LI), 


afT  (ui)  + a2T  (u2)  + a3T  (u3)  + . . . + atT  (ut)  = 0 

T (aiui  + a2u2  + a3u3  H b atut)  = 0 

aiUi  + a2u2  + a3u3  H b atut  £ K.(T) 

aiUi  + a2u2  + a3u3  H b atuf  £ {0} 

aiui  + a2u2  + a3u3  H b atuf  = 0 


Theorem  LTLC 
Definition  KLT 
Theorem  KILT 
Definition  SET 


Since  this  is  a relation  of  linear  dependence  on  the  linearly  independent  set  S, 
we  can  conclude  that 

cii  = 0 a2  = 0 a3  = 0 ...  at  = 0 

and  this  establishes  that  R is  a linearly  independent  set.  ■ 

Theorem  ILTB  Injective  Linear  Transformations  and  Bases 
Suppose  that  T : U — b V is  a linear  transformation  and 

B = {u1;  u2,  u3,  . . . , um} 
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is  a basis  of  U . Then  T is  injective  if  and  only  if 

C={T  (Ul) , T (u  2) , T (u3) , ,T  (um)} 
is  a linearly  independent  subset  ofV. 

Proof.  (=>)  Assume  T is  injective.  Since  B is  a basis,  we  know  B is  linearly  inde- 
pendent (Definition  B).  Then  Theorem  ILTLI  says  that  C is  a linearly  independent 
subset  of  V. 

(<=)  Assume  that  C is  linearly  independent.  To  establish  that  T is  injective,  we 
will  show  that  the  kernel  of  T is  trivial  (Theorem  KILT).  Suppose  that  u £ /C(T). 
As  an  element  of  U , we  can  write  u as  a linear  combination  of  the  basis  vectors  in 
B (uniquely).  So  there  are  are  scalars,  cq,  02,  a3,  . . . , am,  such  that 

u = oqui  + U2U2  + a3u3  + • • • + amum 

Then, 

0 = T (u)  Definition  KLT 

= T (aiui  + CI2U2  + a3u3  + • • • + amum)  Definition  SSVS 

= aiT  (ui)  + a2T  (u2)  + a3T  (u3)  H + amT  (um)  Theorem  LTLC 

This  is  a relation  of  linear  dependence  (Definition  RLD)  on  the  linearly  indepen- 
dent set  C,  so  the  scalars  are  all  zero:  a 1 = a2  = a3  = ■ • • = am  = 0.  Then 

u = aiUi  + d2U2  + a3u3  + • • • + amum 

= Oui  + 0u2  + 0u3  -| + 0um  Theorem  ZSSM 

= 0 + 0 + 0-|-----(-0  Theorem  ZSSM 

= 0 Property  Z 

Since  u was  chosen  as  an  arbitrary  vector  from  JC(T),  we  have  /C(T)  = {0}  and 
Theorem  KILT  tells  us  that  T is  injective.  ■ 

Subsection  ILTD 

Injective  Linear  Transformations  and  Dimension 

Theorem  ILTD  Injective  Linear  Transformations  and  Dimension 

Suppose  that  T:  U — ► V is  an  injective  linear  transformation.  Then  dim  (U)  < 

dim  (V). 

Proof.  Suppose  to  the  contrary  that  m = dim  ( U ) > dim  ( V ) = t.  Let  B be  a basis 
of  U,  which  will  then  contain  m vectors.  Apply  T to  each  element  of  B to  form  a set 
C that  is  a subset  of  V.  By  Theorem  ILTB,  C is  linearly  independent  and  therefore 
must  contain  m distinct  vectors.  So  we  have  found  a set  of  m linearly  independent 
vectors  in  V,  a vector  space  of  dimension  t,  with  m > t.  However,  this  contradicts 
Theorem  G,  so  our  assumption  is  false  and  dim  (U)  < dim  ( V ).  ■ 


§ILT 


Beezer:  A First  Course  in  Linear  Algebra 


459 


Example  NIDAU  Not  injective  by  dimension,  Archetype  U 
The  linear  transformation  in  Archetype  U is 


cl  H-  2 b 12c  — 3 d c -|-  6/ 

( 

a b c 

\ 

2 a — b — c-\-d  — 11/ 

d e f 

a -b  h 7c  -b  2d  H-  c — 3/ 

a + 2b  + 12c  + 5e  — 5/ 

Since  dim  (M23)  = 6 > 4 = dim  (C4),  T cannot  be  injective  for  then  T would 


violate  Theorem  ILTD. 


A 


Notice  that  the  previous  example  made  no  use  of  the  actual  formula  defining 
the  function.  Merely  a comparison  of  the  dimensions  of  the  domain  and  codomain 
are  enough  to  conclude  that  the  linear  transformation  is  not  injective.  Archetype  M 
and  Archetype  N are  two  more  examples  of  linear  transformations  that  have  “big” 
domains  and  “small”  codomains,  resulting  in  “collisions”  of  outputs  and  thus  are 
non-injective  linear  transformations. 


Subsection  CILT 

Composition  of  Injective  Linear  Transformations 

In  Subsection  LT.NLTFO  we  saw  how  to  combine  linear  transformations  to  build 
new  linear  transformations,  specifically,  how  to  build  the  composition  of  two  linear 
transformations  (Definition  LTC).  It  will  be  useful  later  to  know  that  the  composition 
of  injective  linear  transformations  is  again  injective,  so  we  prove  that  here. 

Theorem  CILTI  Composition  of  Injective  Linear  Transformations  is  Injective 
Suppose  that  T : U — » V and  S : V — > W are  injective  linear  transformations.  Then 
(S  o T) : U — > W is  an  injective  linear  transformation. 

Proof.  That  the  composition  is  a linear  transformation  was  established  in  Theorem 
CLTLT,  so  we  need  only  establish  that  the  composition  is  injective.  Applying 
Definition  ILT,  choose  x,  y from  U.  Then  if  (S  o T ) (x)  = (S  o T)  (y), 

=>  S (T  (x))  = S (T  (y))  Definition  LTC 

=>  T (x)  = T (y)  Definition  ILT  for  S 

=>  x = y Definition  ILT  for  T 


Reading  Questions 

1.  Suppose  T : C8  — ► C5  is  a linear  transformation.  Why  is  T not  injective? 

2.  Describe  the  kernel  of  an  injective  linear  transformation. 

3.  Theorem  KPI  should  remind  you  of  Theorem  PSPHS.  Why  do  we  say  this? 
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Exercises 

CIO  Each  archetype  below  is  a linear  transformation.  Compute  the  kernel  for  each. 


Archetype  M,  Archetype  N,  Archetype  O,  Archetype  P,  Archetype  Q,  Archetype  R,  Arche- 
type S,  Archetype  T,  Archetype  U,  Archetype  V,  Archetype  W,  Archetype  X 
C20'  The  linear  transformation  T : C4  — ► C3  is  not  injective.  Find  two  inputs  x,  y £ C4 
that  yield  the  same  output  (that  is  T (x)  = T (y)). 


T 


( 

"*l" 

\ r 

X2 

_ 

x3 

V 

_*4_ 

/ L 

2*i  + *2  + *3 
— Xi  + 3*2  + *3  — *4 
3*i  + *2  + 2*3  — 2*4 


C25^  Define  the  linear  transformation 


T:  C3  — HC2,  T 


*1 

\ r 

X2 

= 

\ 

x3 

) 1 

2*i  — *2  + 5*3 
—4*i  + 2*2  — 10*3 


Find  a basis  for  the  kernel  of  T,  K.(T ).  Is  T injective? 

'12  3 1 O' 

2-11  01 
1 2 -1-2  1 
.13  2 12 

T injective?  (Hint:  No  calculation  is  required. 


C26f  Let  A = 


( 

X 

\ 

2*  + y + z 

C27f  Let  T : C3  -)•  C3  be  given  by  T 

V 

= 

x — y + 2z 

V 

z 

/ 

x + 2 y — z 

C28f  Let  A = 


'12  3 1 ' 

2-11  0 
12-1-2 
L1  3 2 1 . 

fC(T).  Is  T injective? 

ri  2 1 r 


and  let  T : C5  — > C4  be  given  by  T (x)  = Ax.  Is 


. Find  K.{T).  Is  T injective? 


and  let  T : C4  — > C4  be  given  by  T (x)  = Ax.  Find 


C29f  Let  A = 
Is  T injective? 


2 110 
12  12 
12  11 


and  let  T : C4  — ► C4  be  given  by  T (x)  = Ax.  Find  JC(T). 


CSO1^  Let  T : M22  ->  P2  be  given  by  T ^ ^ ^ ^ = (a  + b)  + (a  + c)x  + (a  + d)x2.  Is  T 

injective?  Find  JC(T). 


C31'  Given  that  the  linear  transformation  T : C3  — ¥ C 3,T 
injective,  show  directly  that  { T (ei)  , T ( e2 ) , T (e3)}  is  a linearly  independent  set. 


( 

X 

\ 

2*  + y 

y 

= 

2 y + z 

V 

z 

/ 

x + 2 z 
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C32^  Given  that  the  linear  transformation  T : C2  — > C3,  T 
show  directly  that  { T (ei) , T (e2)}  is  a linearly  independent  set. 


x + y 
2 x + y 
x + 2 y 


is  injective, 


C33t  Given  that  the  linear  transformation  T:  C3  — > C5,  T 
is  injective,  show  directly  that  {T  (ei) , T (e2) , T (es)}  is  a linearly  independent  set. 


T 3 2' 

X 

\ 

0 1 1 

X 

y 

= 

1 2 1 

y 

\ 

z 

/ 

1 0 1 

z 

3 1 2 

C40f  Show  that  the  linear  transformation  R is  not  injective  by  finding  two  different 
elements  of  the  domain,  x and  y,  such  that  R (x)  = R( y).  (S22  is  the  vector  space  of 
symmetric  2x2  matrices.) 


R : S22  — t .Pi  R 


a 

b 


(2a  — b + c)  + (a  + b + 2 c)x 


M60  Suppose  U and  V are  vector  spaces.  Define  the  function  Z : U — > V by  Z (u)  = Oy 
for  every  u £ U.  Then  by  Exercise  LT.M60,  Z is  a linear  transformation.  Formulate  a 
condition  on  U that  is  equivalent  to  Z being  an  injective  linear  transformation.  In  other 
words,  fill  in  the  blank  to  complete  the  following  statement  (and  then  give  a proof):  Z is 
injective  if  and  only  if  U is  . (See  Exercise  SLT.M60,  Exercise  IVLT.M60.) 

T10'  Suppose  T : U — > V is  a linear  transformation.  For  which  vectors  v £ V is  T_1  (v) 
a subspace  of  [/? 

T15^  Suppose  that  that  T:  U — » V and  S : V — > W are  linear  transformations.  Prove  the 
following  relationship  between  kernels. 

fC(T)  C K-(SoT) 

T2(F  Suppose  that  A is  an  m x n matrix.  Define  the  linear  transformation  T by 

T : C"  —¥  Cm,  T (x)  = Ax 

Prove  that  the  kernel  of  T equals  the  null  space  of  A,  K.(T)  = AT  (A). 


Section  SLT 

Surjective  Linear  Transformations 

The  companion  to  an  injection  is  a surjection.  Surjective  linear  transformations  are 
closely  related  to  spanning  sets  and  ranges.  So  as  you  read  this  section  reflect  back 
on  Section  ILT  and  note  the  parallels  and  the  contrasts.  In  the  next  section,  Section 
IVLT,  we  will  combine  the  two  properties. 


Subsection  SLT 

Surjective  Linear  Transformations 

As  usual,  we  lead  with  a definition. 

Definition  SLT  Surjective  Linear  Transformation 

Suppose  T : U — > V is  a linear  transformation.  Then  T is  surjective  if  for  every 
v £ V there  exists  a u £ U so  that  T (u)  = v.  □ 

Given  an  arbitrary  function,  it  is  possible  for  there  to  be  an  element  of  the 
codomain  that  is  not  an  output  of  the  function  (think  about  the  function  y = f(x)  = 
x 2 and  the  codomain  element  y = —3).  For  a surjective  function,  this  never  happens. 
If  we  choose  any  element  of  the  codomain  (v  £ V ) then  there  must  be  an  input 
from  the  domain  (u  £ U)  which  will  create  the  output  when  used  to  evaluate  the 
linear  transformation  ( T (u)  = v).  Some  authors  prefer  the  term  onto  where  we  use 
surjective,  and  we  will  sometimes  refer  to  a surjective  linear  transformation  as  a 
surjection. 

Subsection  ESLT 

Examples  of  Surjective  Linear  Transformations 

It  is  perhaps  most  instructive  to  examine  a linear  transformation  that  is  not  surjective 
first. 

Example  NSAQ  Not  surjective,  Archetype  Q 
Archetype  Q is  the  linear  transformation 


( 

~x{ 

) 

— 2a;  1 + 3x2  + 3x3  ~ 624  + 3xs 

X2 

— 16a;i  + 9x2  + 12x3  — 28x4  + 28x5 

%3 

= 

— 19a;i  + 7x2  + 14x3  — 32x4  + 37x5 

X4 

— 21xi  + 9x2  + 15x3  ~ 35x4  + 39xs 

V 

_X5_ 

— 9xi  + 5x2  + 7x3  — 16x4  + 16x5 
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We  will  demonstrate  that 


-1' 

2 

3 

-1 

4 


is  an  unobtainable  element  of  the  codomain.  Suppose  to  the  contrary  that  u is  an 
element  of  the  domain  such  that  T (u)  = v. 

Then 


6W4  + 3rt5 

281/4  T 281/5 
32it4  A 37 7x5 
35tx4  A 39tx5 
— 9txi  + 5t/2  + 7tx3  — 167x4  + I67X5 


-2 

3 

3 

-6 

3 " 

"Tif 

-16 

9 

12 

-28 

28 

U2 

-19 

7 

14 

-32 

37 

U3 

-21 

9 

15 

-35 

39 

7X4 

-9 

5 

7 

-16 

16 

_7X5_ 

Now  we  recognize  the  appropriate  input  vector  u as  a solution  to  a linear  system 
of  equations.  Form  the  augmented  matrix  of  the  system,  and  row-reduce  to 

"0  0 0 0-10" 

0 0 0 0 -f  0 

0 0 0 0 -§  0 

0 0 0 0 -1  0 

0 0 0 0 0 0 


'— T 

( 

~u{ 

\ 

2 

U2 

3 

= v = T(u)  =T 

U3 

-1 

7X4 

4 

\ 

_7X5_ 

/ 

— 2txi  + 3t/2  + 3tx3  — 

— I67X4  d-  9t/2  A 12tx3  — 

— 19txi  A 7tx2  A 14tx3  — 
— 21txi  A 9tx2  A 15tx3  — 


With  a leading  1 in  the  last  column,  Theorem  RCLS  tells  us  the  system  is 
inconsistent.  From  the  absence  of  any  solutions  we  conclude  that  no  such  vector  u 
exists,  and  by  Definition  SLT,  T is  not  surjective. 

Again,  do  not  concern  yourself  with  how  v was  selected,  as  this  will  be  explained 
shortly.  However,  do  understand  why  this  vector  provides  enough  evidence  to  conclude 
that  T is  not  surjective.  A 


Here  is  a cartoon  of  a non-surjective  linear  transformation.  Notice  that  the  central 
feature  of  this  cartoon  is  that  the  vector  v £ V does  not  have  an  arrow  pointing 
to  it,  implying  there  is  no  u £ U such  that  T (u)  = v.  Even  though  this  happens 
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again  with  a second  unnamed  vector  in  V,  it  only  takes  one  occurrence  to  destroy 
the  possibility  of  surjectivity. 


Diagram  NSLT:  Non-Surjective  Linear  Transformation 


To  show  that  a linear  transformation  is  not  surjective,  it  is  enough  to  find  a single 
element  of  the  codomain  that  is  never  created  by  any  input,  as  in  Example  NSAQ. 
However,  to  show  that  a linear  transformation  is  surjective  we  must  establish  that 
every  element  of  the  codomain  occurs  as  an  output  of  the  linear  transformation  for 
some  appropriate  input. 


Example  SAR  Surjective,  Archetype  R 
Archetype  R is  the  linear  transformation 


/ 

~x{ 

\ 

— 65a;  1 + 128x2  + IOX3  — 262x4  + 40x5 

X2 

36xi  — 73x2  — X3  + 151x4  — 16x5 

= 

— 44xi  + 88x2  + 5x3  — I8OX4  + 24x5 

X4 

34xi  — 68x2  — 3x3  + 140x’4  — 18x5 

V 

_X5_ 

12xi  — 24x’2  — X3  + 49x4  — 5x5 

To  establish  that  R is  surjective  we  must  begin  with  a totally  arbitrary  element 
of  the  codomain,  v and  somehow  find  an  input  vector  u such  that  T (u)  = v.  We 
desire, 


T (u)  = v 
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-65xti  + 128it2  + 10xx3  — 262it4  + 40xt5 
36mi  — 73it2  — xx3  + 151u4  — 16xx5 
— 44mi  + 88x12  + 5u3  — 180zi4  + 24xt5 
34«i  — 68x12  ^ 3xx3  + 140xt4  — I8M5 
12xii  — 24?X2  — M3  + 49xi4  — 5xt5 

'«! 

U2 
U3 
U4 

u5 

We  recognize  this  equation  as  a system  of  equations  in  the  variables  xq,  but 
our  vector  of  constants  contains  symbols.  In  general,  we  would  have  to  row-reduce 
the  augmented  matrix  by  hand,  due  to  the  symbolic  final  column.  However,  in  this 
particular  example,  the  5x5  coefficient  matrix  is  nonsingular  and  so  has  an  inverse 
(Theorem  NI,  Definition  MI). 


'-65 

128 

10 

-262 

40  ' 

36 

-73 

-1 

151 

-16 

-44 

88 

5 

-180 

24 

34 

-68 

-3 

140 

-18 

12 

-24 

-1 

49 

-5 

'xif 

V2 

= 

V3 

Xl4 

_V5_ 

'Vl 

V2 

= 

V3 

Xl4 

V5_ 

'-65 

128 

10 

-262 

40  ' 

-1 

'-47 

92 

1 

-181 

-14- 

36 

-73 

-1 

151 

-16 

27 

-55 

7 

2 

221 

2 

11 

-44 

88 

5 

-180 

24 

= 

-32 

64 

-1 

-126 

-12 

34 

-68 

-3 

140 

-18 

25 

-50 

3 

! 

2 

199 

A 

2 

9 

12 

-24 

-1 

49 

-5 

9 

-18 

4 . 

so  we  find  that 


-47 

27 

-32 

25 

9 


92 

-55 

64 

-50 

-18 


1 

7 

2 

-1 

3 


-181 

221 

2 

-126 

199 

A 

2 


-14' 

11 

-12 

9 

4 


'v{ 

V2 

V3 

Vi 

V5_ 

— 47x>i  + 92ii2  + V3  — 181x14  — 14x>5' 
27xii  - 55x>2  + 5U3  + qpxi4  + llxi5 
— 32xii  + 64xi2  — XI3  — 126x14  — 12xi5 
25xii  — 50xi2  + §xi3  + ^Pxq  + 9xi5 


? 


7?„ 


. 9xii  — 18xi2  + ^xi3  + -yXi4  + 4xi5  . 

This  establishes  that  if  we  are  given  any  output  vector  v,  we  can  use  its  com- 
ponents in  this  final  expression  to  formulate  a vector  u such  that  T (u)  = v.  So 
by  Definition  SLT  we  now  know  that  T is  surjective.  You  might  try  to  verify  this 
condition  in  its  full  generality  (i.e.  evaluate  T with  this  final  expression  and  see  if 
you  get  v as  the  result),  or  test  it  more  specifically  for  some  numerical  vector  v (see 
Exercise  SLT.C20).  A 


Here  is  the  cartoon  for  a surjective  linear  transformation.  It  is  meant  to  suggest 
that  for  every  output  in  V there  is  at  least  one  input  in  U that  is  sent  to  the  output. 
(Even  though  we  have  depicted  several  inputs  sent  to  each  output.)  The  key  feature 
of  this  cartoon  is  that  there  are  no  vectors  in  V without  an  arrow  pointing  to  them. 
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Diagram  SLT:  Surjective  Linear  Transformation 


Let  us  now  examine  a surjective  linear  transformation  between  abstract  vector 
spaces. 


Example  SAV  Surjective,  Archetype  V 
Archetype  V is  defined  by 

a + b a — 2c 
d b~d 


T : P3  — )•  M22,  T (a  + bx  + cx 2 + dx 3)  = 


To  establish  that  the  linear  transformation  is  surjective,  begin  by  choosing  an 
arbitrary  output.  In  this  example,  we  need  to  choose  an  arbitrary  2x2  matrix,  say 


x 

z 


y 

w 


and  we  would  like  to  find  an  input  polynomial 

u = a + bx  + cx2  + dx3 


so  that  T (u)  = v.  So  we  have, 


= v 


= T(  u) 

= T (a  + bx  + cx2  + dx3) 


§SLT 


Beezer:  A First  Course  in  Linear  Algebra 


467 


a + b a — 2c 
d b — d 

Matrix  equality  leads  us  to  the  system  of  four  equations  in  the  four  unknowns, 
x,y,z,w, 

a + b = x 
a — 2c  = y 
d = z 
b — d = w 


which  can  be  rewritten  as  a matrix  equation, 


T 1 0 O' 

"a" 

~x~ 

10-20 

b 

y 

0 0 0 1 

c 

z 

0 10-1 

A 

_w_ 

The  coefficient  matrix  is  nonsingular,  hence  it  has  an  inverse, 


'1  1 0 

1 0 -2 

0 0 0 

.0  1 0 

so  we  have 

~a 

b 

c 

d 


0 1 

-1 

ri 

0 

-1 

-ii 

0 

0 

0 

1 

l 

l 

= 

1 

1 

1 

1 

2 

2 

2 

-lj 

Lo 

0 

l 

o J 

ri  o -l  -in 

~X~ 

0 0 11 
1111 

y 

2 2 2 2 

Z 

0 0 10 

w 

x — z — w 
z + w 

\(x-y-  z-w) 
z 


So  the  input  polynomial  u = (x  — z — w)  + (z  + w)x  + ^(x  — y — z — w)x2  + zx 3 will 
yield  the  output  matrix  v,  no  matter  what  form  v takes.  This  means  by  Definition 
SLT  that  T is  surjective.  All  the  same,  let  us  do  a concrete  demonstration  and 
evaluate  T with  u, 

T (u)  = T ^{x  — z — w)  + (z  + w)x  + ^(x  — y — z — w)x2  + zx3^j 

{x  — z — w)  + {z  + w)  (x  — z — w)  — 2(^(x  — y — z — w)) 
z (z  + w)  — z 

X y 

z w 
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= v 

A 


Subsection  RLT 

Range  of  a Linear  Transformation 

For  a linear  transformation  T : U — > V,  the  range  is  a subset  of  the  codomain  V . 
Informally,  it  is  the  set  of  all  outputs  that  the  transformation  creates  when  fed  every 
possible  input  from  the  domain.  It  will  have  some  natural  connections  with  the 
column  space  of  a matrix,  so  we  will  keep  the  same  notation,  and  if  you  think  about 
your  objects,  then  there  should  be  little  confusion.  Here  is  the  careful  definition. 

Definition  RLT  Range  of  a Linear  Transformation 

Suppose  T:  U V is  a linear  transformation.  Then  the  range  of  T is  the  set 

H(T)  = {T(u)\ueU} 

□ 


Example  RAO  Range,  Archetype  O 
Archetype  O is  the  linear  transformation 

— X\  + X2  — 3x3 
—x\  + 2x2  — 4a;3 
X\  + X2  + x3 

2xi  + 3x2  + x3 
X\  + 2x3 

To  determine  the  elements  of  C5  in  1Z(T).  find  those  vectors  v such  that  T (u)  = v 
for  some  ugC3, 


T:  Cd 


C5 


T 


v = T(u) 


- Ui  + U2  - 3 u3  ' 

— ix  i + 2u2  — 4 u3 

U\+U2+  u3 
2u\  + 3 u2  + u3 
u\  + 2 u3 


-u{ 

’ 112  ’ 

3m3" 

-Ml 

2m2 

— 4m3 

Ml 

+ 

U2 

+ 

U3 

2mi 

3m2 

u3 

Ml  _ 

_ 0 _ 

2m3 
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'-r 

T 

'-3' 

-1 

2 

—4 

1 

+ U2 

1 

+ u3 

1 

2 

3 

1 

1 

0 

2 

This  says  that  every  output  of  T (in  other  words,  the  vector  v)  can  be  written 
as  a linear  combination  of  the  three  vectors 


'-r 

T 

'-3' 

-1 

2 

—4 

1 

1 

1 

2 

3 

1 

1 

0 

2 

using  the  scalars  U\ . 112,  u3.  Furthermore,  since  u can  be  any  element  of  C3,  every 
such  linear  combination  is  an  output.  This  means  that 


The  three  vectors  in  this  spanning  set  for  1Z(T)  form  a linearly  dependent  set 
(check  this!).  So  we  can  find  a more  economical  presentation  by  any  of  the  various 
methods  from  Section  CRS  and  Section  FS.  We  will  place  the  vectors  into  a matrix 
as  rows,  row-reduce,  toss  out  zero  rows  and  appeal  to  Theorem  BRS,  so  we  can 
describe  the  range  of  T with  a basis, 


A 


We  know  that  the  span  of  a set  of  vectors  is  always  a subspace  (Theorem  SSS), 
so  the  range  computed  in  Example  RAO  is  also  a subspace.  This  is  no  accident,  the 
range  of  a linear  transformation  is  always  a subspace. 


Theorem  RLTS  Range  of  a Linear  Transformation  is  a Subspace 

Suppose  that  T:  U — * V is  a linear  transformation.  Then  the  range  ofT,  1Z(T),  is  a 

subspace  ofV. 


Proof.  We  can  apply  the  three-part  test  of  Theorem  TSS.  First,  0^  € U and 
T (Ou)  = Oy  by  Theorem  LTTZZ,  so  Oy  € 1Z(T)  and  we  know  that  the  range  is 
nonempty. 
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Suppose  we  assume  that  x,  y G 1Z(T).  Is  x + y G 7 Z(T)1  If  x,  y G 1Z(T)  then  we 
know  there  are  vectors  w,  z G U such  that  T (w)  = x and  T (z)  = y.  Because  U is 
a vector  space,  additive  closure  (Property  AC)  implies  that  w + z G U. 

Then 

T (w  + z)  = T (w)  + T (z)  Definition  LT 

= x + y Definition  of  w and  z 

So  we  have  found  an  input,  w + z,  which  when  fed  into  T creates  x + y as  an 
output.  This  qualifies  x 4-  y for  membership  in  1Z(T).  So  we  have  additive  closure. 

Suppose  we  assume  that  a G C and  x G 1Z(T).  Is  ax  G 1Z(T)7  If  x G 7 Z(T),  then 
there  is  a vector  w G U such  that  T (w)  = x.  Because  U is  a vector  space,  scalar 
closure  implies  that  aw  G U.  Then 

T (aw)  = olT  (w)  Definition  LT 

= ax  Definition  of  w 

So  we  have  found  an  input  (aw)  which  when  fed  into  T creates  ax  as  an  output. 
This  qualifies  ax  for  membership  in  1Z(T).  So  we  have  scalar  closure  and  Theorem 
TSS  tells  us  that  1Z(T)  is  a subspace  of  V.  ■ 


Let  us  compute  another  range,  now  that  we  know  in  advance  that  it  will  be  a 
subspace. 


Example  FRAN  Full  range,  Archetype  N 
Archetype  N is  the  linear  transformation 


T:  C5  ->  C3, 


/ 

~x{ 

\ 

X2 

~2x\  + X2  + 3^3  — 4x4  + 5x5' 

X3 

= 

X±  — 2x2  + 3^3  — 9^4  + 3X5 

X4 

3xi  + 4x3  — 6x4  + 5x5 

V 

_x5_ 

) 

To  determine  the  elements  of  C3  in  TZ(T),  find  those  vectors  v such  that  T (u)  = v 
for  some  u G C5, 


v = T(u) 


2u\  T u 2 T 3ii3 
u\  — 2 u2  + 3 w3 
3rti  + 4 u3  — 


— 4-li4  + 5ll5 

— 9ll4  + 3U5 
6U4  + 5«5 


'2  u{ 

«2 

r3u3] 

4u,4' 

r5u5] 

Ui 

+ 

—2u2 

+ 

3u3 

+ 

— 9u,4 

+ 

3u5 

3 Hi 

0 

4u3. 

— 6u4 

5u5. 

Ui 

'2‘ 

1 

+ u2 

' 1 ' 

-2 

+ u3 

S- 

3 

+ U4 

"—4" 

-9 

+ u5 

-5- 

3 

3 

0 

4 

-6 

5 
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This  says  that  every  output  of  T (in  other  words,  the  vector  v)  can  be  written 
as  a linear  combination  of  the  five  vectors 


'2 

' 1 ' 

-3- 

"—4" 

T 

1 

-2 

3 

-9 

3 

3 

0 

4 

-6 

5 

using  the  scalars  u\,  U2,  M3,  M4,  115.  Furthermore,  since  u can  be  any  element  of  C5, 
every  such  linear  combination  is  an  output.  This  means  that 


K(T)  = 


The  five  vectors  in  this  spanning  set  for  TZ(T)  form  a linearly  dependent  set 
(Theorem  MVSLD).  So  we  can  find  a more  economical  presentation  by  any  of  the 
various  methods  from  Section  CRS  and  Section  FS.  We  will  place  the  vectors  into  a 
matrix  as  rows,  row-reduce,  toss  out  zero  rows  and  appeal  to  Theorem  BRS,  so  we 
can  describe  the  range  of  T with  a (nice)  basis, 


K(T)  = 


= C3 


A 


In  contrast  to  injective  linear  transformations  having  small  (trivial)  kernels 
(Theorem  KILT),  surjective  linear  transformations  have  large  ranges,  as  indicated  in 
the  next  theorem. 


Theorem  RSLT  Range  of  a Surjective  Linear  Transformation 

Suppose  that  T : U — ► V is  a linear  transformation.  Then  T is  surjective  if  and  only 

if  the  range  of  T equals  the  codomain,  1Z(T)  = V. 


Proof.  (=>)  By  Definition  RLT,  we  know  that  1Z(T)  C V.  To  establish  the  reverse 
inclusion,  assume  v € V.  Then  since  T is  surjective  (Definition  SLT),  there  exists  a 
vector  u £ U so  that  T (u)  = v.  However,  the  existence  of  u gains  v membership  in 
H(T),  so  V C K(T).  Thus,  n(T)  = V. 

(4=)  To  establish  that  T is  surjective,  choose  veF.  Since  we  are  assuming  that 
1Z(T)  =F,v£  1Z(T).  This  says  there  is  a vector  u e U so  that  T (u)  = v,  i.e.  T is 
surjective.  ■ 


Example  NSAQR  Not  surjective,  Archetype  Q,  revisited 

We  are  now  in  a position  to  revisit  our  first  example  in  this  section,  Example  NSAQ. 
In  that  example,  we  showed  that  Archetype  Q is  not  surjective  by  constructing  a 
vector  in  the  codomain  where  no  element  of  the  domain  could  be  used  to  evaluate 
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the  linear  transformation  to  create  the  output,  thus  violating  Definition  SLT.  Just 
where  did  this  vector  come  from? 

The  short  answer  is  that  the  vector 


-L 

2 

3 

-1 

4 


was  constructed  to  lie  outside  of  the  range  of  T.  How  was  this  accomplished?  First, 
the  range  of  T is  given  by 


U(T)  = 


r 

m 

r o i 

r o i 

roi 

1 

0 

i 

0 

0 

0 

0 

i 

0 

\ 

I 

0 

0 

0 

1 

1 

-1 

-1 

2 

> 

Suppose  an  element  of  the  range  v*  has  its  first  4 components  equal  to  —1,  2,  3, 
— 1,  in  that  order.  Then  to  be  an  element  of  7 Z(T),  we  would  have 


T 

' 0 ' 

' 0 ' 

'O' 

'-1' 

0 

1 

0 

0 

2 

0 

+ (2) 

0 

+ (3) 

1 

+ (-l) 

0 

= 

3 

0 

0 

0 

1 

-1 

1 

-1 

-1 

2 

-8 

So  the  only  vector  in  the  range  with  these  first  four  components  specified,  must 
have  —8  in  the  fifth  component.  To  set  the  fifth  component  to  any  other  value  (say, 
4)  will  result  in  a vector  (v  in  Example  NSAQ)  outside  of  the  range.  Any  attempt 
to  find  an  input  for  T that  will  produce  v as  an  output  will  be  doomed  to  failure. 

Whenever  the  range  of  a linear  transformation  is  not  the  whole  codomain,  we  can 
employ  this  device  and  conclude  that  the  linear  transformation  is  not  surjective.  This 
is  another  way  of  viewing  Theorem  RSLT.  For  a surjective  linear  transformation, 
the  range  is  all  of  the  codomain  and  there  is  no  choice  for  a vector  v that  lies  in  V , 
yet  not  in  the  range.  For  every  one  of  the  archetypes  that  is  not  surjective,  there  is 
an  example  presented  of  exactly  this  form.  A 


Example  NSAO  Not  surjective,  Archetype  O 

In  Example  RAO  the  range  of  Archetype  O was  determined  to  be 


a subspace  of  dimension  2 in  C5.  Since  1Z(T)  ^ C5,  Theorem  RSLT  says  T is  not 


§SLT 


Beezer:  A First  Course  in  Linear  Algebra 


473 


surjective. 


A 


Example  SAN  Surjective,  Archetype  N 

The  range  of  Archetype  N was  computed  in  Example  FRAN  to  be 


K(T)  = 


Since  the  basis  for  this  subspace  is  the  set  of  standard  unit  vectors  for  C3 
(Theorem  SUVB),  we  have  1Z(T)  = C3  and  by  Theorem  RSLT,  T is  surjective.  A 


Subsection  SSSLT 

Spanning  Sets  and  Surjective  Linear  Transformations 

Just  as  injective  linear  transformations  are  allied  with  linear  independence  (Theorem 
ILTLI,  Theorem  ILTB),  surjective  linear  transformations  are  allied  with  spanning 
sets. 

Theorem  SSRLT  Spanning  Set  for  Range  of  a Linear  Transformation 
Suppose  that  T : U — ► V is  a linear  transformation  and 

S = {u1:  u2,  u3)  . . . , ut} 

spans  U . Then 

R = {T  (Ul) , T (u2) , T (u3) , . . . , T (m)} 

spans  7 Z(T). 

Proof.  We  need  to  establish  that  7 Z(T)  = ( R ),  a set  equality.  First  we  establish 
that  7 Z(T)  C ( R ).  To  this  end,  choose  v £ TZ(T).  Then  there  exists  a vector 
u £ U,  such  that  T (u)  = v (Definition  RLT).  Because  S spans  U there  are  scalars, 
ai,  a2,  a3,  . . . , at,  such  that 

u = aiui  + a2u2  + a3u3  H b atut 

Then 

v = T (u)  Definition  RLT 

= T (aim  + d2U2  + a3u3  H + atut)  Definition  SSVS 

= afT  (m)  + a2 T (u2)  + a3T  (u3)  + . . . + atT  (ut)  Theorem  LTLC 


which  establishes  that  v £ (R)  (Definition  SS).  So  7 Z(T)  C (R). 

To  establish  the  opposite  inclusion,  choose  an  element  of  the  span  of  R,  say 
v £ (R).  Then  there  are  scalars  &i,  &2,  63,  . . . , bt  so  that 

v = b\T  (m)  + b2T  (u2)  + b3T  (u3)  H h btT  (ut) 


Definition  SS 
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= T (&1U1  + 62U2  + 63U3  + • • • + bt ut)  Theorem  LTLC 

This  demonstrates  that  v is  an  output  of  the  linear  transformation  T,  so  v G TZ(T). 
Therefore  (R)  C 1Z(T),  so  we  have  the  set  equality  1Z{T)  = (R)  (Definition  SE).  In 
other  words,  R spans  1Z(T)  (Definition  SSVS).  ■ 


Theorem  SSRLT  provides  an  easy  way  to  begin  the  construction  of  a basis  for 
the  range  of  a linear  transformation,  since  the  construction  of  a spanning  set  requires 
simply  evaluating  the  linear  transformation  on  a spanning  set  of  the  domain.  In 
practice  the  best  choice  for  a spanning  set  of  the  domain  would  be  as  small  as 
possible,  in  other  words,  a basis.  The  resulting  spanning  set  for  the  codomain  may 
not  be  linearly  independent,  so  to  find  a basis  for  the  range  might  require  tossing 
out  redundant  vectors  from  the  spanning  set.  Here  is  an  example. 


Example  BRLT  A basis  for  the  range  of  a linear  transformation 
Define  the  linear  transformation  T : M22  P2  by 


T 


(u  T 2 b T 8c  T d)  ( — 3u  T 26  T 5 cT)  x T (u  A b T 5c)  x 2 


A convenient  spanning  set  for  M2 2 is  the  basis 


S = 


0 

1 

0 

0 

0 

0 

5 

1 

0 

0 0 
0 1 


So  by  Theorem  SSRLT,  a spanning  set  for  1Z(T)  is 


R=  <T 


1 0 
0 0 


,T 


0 1 
0 0 


,T 


0 0 
1 0 


T 


0 0 
0 1 


= {l  — 3x  + x2,  2 + 2x  + x2,  8 + 5a;2,  1 + 5a;} 


The  set  R is  not  linearly  independent,  so  if  we  desire  a basis  for  R(T),  we  need 
to  eliminate  some  redundant  vectors.  Two  particular  relations  of  linear  dependence 
on  R are 


(— 2)(1  - 3a;  + a;2)  + (-3)(2  + 2a;  + a;2)  + (8  + 5a;2)  = 0 + Ox  + Ox2  = 0 
(1  — 3a:  + x2)  + (— 1)(2  + 2a;  + a:2)  + (1  + 5x)  = 0 + Oa;  + Ox2  = 0 

These,  individually,  allow  us  to  remove  8 + 5x2  and  1 + 5x  from  R without 
destroying  the  property  that  R spans  1Z(T).  The  two  remaining  vectors  are  linearly 
independent  (check  this!),  so  we  can  write 

K{T)  = ({1  - 3x  + x2,  2 + 2x  + x2}) 

and  see  that  dim  (R(T))  =2.  A 

Elements  of  the  range  are  precisely  those  elements  of  the  codomain  with  nonempty 
preimages. 

Theorem  RPI  Range  and  Pre-Image 
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Suppose  that  T:  U — ► V is  a linear  transformation.  Then 

v G 1Z(T)  if  and  only  if  T~l  (v)  ^ 0 

Proof.  (=>)  If  v G 7?.(T),  then  there  is  a vector  u £ U such  that  T (u)  = v.  This 
qualifies  u for  membership  in  T-1  (v),  and  thus  the  preimage  of  v is  not  empty. 

(-<=)  Suppose  the  preimage  of  v is  not  empty,  so  we  can  choose  a vector  u G U 
such  that  T (u)  = v.  Then  v G 7 Z(T).  ■ 

Now  would  be  a good  time  to  return  to  Diagram  KPI  which  depicted  the  pre- 
images of  a non- surjective  linear  transformation.  The  vectors  x,  y G V were  elements 
of  the  codomain  whose  pre-images  were  empty,  as  we  expect  for  a non-surjective 
linear  transformation  from  the  characterization  in  Theorem  RPI. 

Theorem  SLTB  Surjective  Linear  Transformations  and  Bases 
Suppose  that  T:  U — )•  V is  a linear  transformation  and 

B = {ui,  u2,  u3,  . . . , um} 
is  a basis  of  U.  Then  T is  surjective  if  and  only  if 

C = {T  (Ul) , T (u2) , T (u3) , . . . , T (um)} 
is  a spanning  set  for  V . 

Proof.  (=>)  Assume  T is  surjective.  Since  B is  a basis,  we  know  B is  a spanning 
set  of  U (Definition  B).  Then  Theorem  SSRLT  says  that  C spans  1Z(T).  But  the 
hypothesis  that  T is  surjective  means  V = 1Z(T)  (Theorem  RSLT),  so  C spans  V. 

(<=)  Assume  that  C spans  V . To  establish  that  T is  surjective,  we  will  show  that 
every  element  of  V is  an  output  of  T for  some  input  (Definition  SLT).  Suppose  that 
v G V.  As  an  element  of  V,  we  can  write  v as  a linear  combination  of  the  spanning 
set  C.  So  there  are  scalars,  bi,  b’2,  &3,  . . . , bm,  such  that 

v = b\T  (ui)  + b2T  (u2)  + b3T  (u3)  H f bmT  (um) 

Now  define  the  vector  u G U by 

u = biui  + b2 u2  + b3 u3  H 1-  bmum 

Then 

T (u)  = T (fq  m + 62u2  + 63u3  H h bm  um) 

= biT  (ui)  + b2T  (u2)  + b3T  (u3)  -| 1-  bmT  (um)  Theorem  LTLC 

= v 

So,  given  any  choice  of  a vector  v G V,  we  can  design  an  input  u G U to  produce 
v as  an  output  of  T.  Thus,  by  Definition  SLT,  T is  surjective.  ■ 
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Subsection  SLTD 

Surjective  Linear  Transformations  and  Dimension 

Theorem  SLTD  Surjective  Linear  Transformations  and  Dimension 

Suppose  that  T : U — > V is  a surjective  linear  transformation.  Then  dim  (U)  > 

dim  (Vj. 

Proof.  Suppose  to  the  contrary  that  m = dim  ([/)  < dim  (V)  = t.  Let  B be  a basis 
of  U , which  will  then  contain  m vectors.  Apply  T to  each  element  of  B to  form  a 
set  C that  is  a subset  of  V.  By  Theorem  SLTB,  C is  a spanning  set  of  V with  to  or 
fewer  vectors.  So  we  have  a set  of  to  or  fewer  vectors  that  span  V,  a vector  space  of 
dimension  t,  with  to  < t.  However,  this  contradicts  Theorem  G,  so  our  assumption 
is  false  and  dim  (U)  > dim  (Vj.  ■ 

Example  NSDAT  Not  surjective  by  dimension,  Archetype  T 
The  linear  transformation  in  Archetype  T is 

T:P4^P5,  T(p(x))  = (x-2)p(x) 

Since  dim(Pt)  = 5 < 6 = dim(P5),  T cannot  be  surjective  for  then  it  would 
violate  Theorem  SLTD.  A 

Notice  that  the  previous  example  made  no  use  of  the  actual  formula  defining  the 
function.  Merely  a comparison  of  the  dimensions  of  the  domain  and  codomain  are 
enough  to  conclude  that  the  linear  transformation  is  not  surjective.  Archetype  O 
and  Archetype  P are  two  more  examples  of  linear  transformations  that  have  “small” 
domains  and  “big”  codomains,  resulting  in  an  inability  to  create  all  possible  outputs 
and  thus  they  are  non-surjective  linear  transformations. 

Subsection  CSLT 

Composition  of  Surjective  Linear  Transformations 

In  Subsection  LT.NLTFO  we  saw  how  to  combine  linear  transformations  to  build 
new  linear  transformations,  specifically,  how  to  build  the  composition  of  two  linear 
transformations  (Definition  LTC).  It  will  be  useful  later  to  know  that  the  composition 
of  surjective  linear  transformations  is  again  surjective,  so  we  prove  that  here. 

Theorem  CSLTS  Composition  of  Surjective  Linear  Transformations  is  Surjective 
Suppose  that  T:  U — » V and  S : V — > W are  surjective  linear  transformations.  Then 
(S  o T) : U — > W is  a surjective  linear  transformation. 

Proof.  That  the  composition  is  a linear  transformation  was  established  in  Theorem 
CLTLT,  so  we  need  only  establish  that  the  composition  is  surjective.  Applying 
Definition  SLT,  choose  w £ W. 
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Because  S is  surjective,  there  must  be  a vector  v £ V,  such  that  S (v)  = w.  With 
the  existence  of  v established,  that  T is  surjective  guarantees  a vector  ugU  such 
that  T (u)  = v.  Now, 

(S  o T)  (u)  = S (T  (u))  Definition  LTC 

= S (v)  Definition  of  u 

= w Definition  of  v 

This  establishes  that  any  element  of  the  codomain  (w)  can  be  created  by  evalu- 
ating S o T with  the  right  input  (u).  Thus,  by  Definition  SLT,  S o T is  surjective. 


Reading  Questions 

1.  Suppose  T : C5  — > C8  is  a linear  transformation.  Why  is  T not  surjective? 

2.  What  is  the  relationship  between  a surjective  linear  transformation  and  its  range? 

3.  There  are  many  similarities  and  differences  between  injective  and  surjective  linear 
transformations.  Compare  and  contrast  these  two  different  types  of  linear  transformations. 
(This  means  going  well  beyond  just  stating  their  definitions.) 


Exercises 

CIO  Each  archetype  below  is  a linear  transformation.  Compute  the  range  for  each. 


Archetype  M,  Archetype  N,  Archetype  O,  Archetype  P,  Archetype  Q,  Archetype  R,  Arche- 
type S,  Archetype  T,  Archetype  U,  Archetype  V,  Archetype  W,  Archetype  X 
C20  Example  SAR  concludes  with  an  expression  for  a vector  u £ C5  that  we  believe 
will  create  the  vector  v £ C5  when  used  to  evaluate  T.  That  is,  T (u)  = v.  Verify  this 
assertion  by  actually  evaluating  T with  u.  If  you  do  not  have  the  patience  to  push  around 
all  these  symbols,  try  choosing  a numerical  instance  of  v,  compute  u,  and  then  compute 
T (u),  which  should  result  in  v. 

C22'  The  linear  transformation  S:  C4  — ► C3  is  not  surjective.  Find  an  output  w £ C3 
that  has  an  empty  pre-image  (that  is  S'-1  (w)  = 0.) 


S 


/ 

*r 

\ r 

*2 

_ 

*3 

“ 

V 

_*4_ 

/ L 

2*1  + *2  + 3*3  — 4*4 
*1  + 3*2  + 4*3  + 3*4 
— *1  + 2*2  + *3  + 7*4 


C23^  Determine  whether  or  not  the  following  linear  transformation  T : C5  — ¥ P3  is 
surjective: 


/ 

~a 

\ 

b 

c 

d 

V 

_e_ 

) 

= a + (b  + c)x  + (c  + d)x2  + (d  + e)*3 
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C24+  Determine  whether  or  not  the  linear  transformation  T : P3  — ¥ C°  below  is  surjective: 

T (a  + bx  + cx 2 + dx3)  = 


'a  + 6" 
b + c 
c + d 
a + c 
b + d 


C25^  Define  the  linear  transformation 


T:C3— >■  C2,  T 


Xi 

\ r 

X2 

= 

\ 

X3 

/ L 

2xi  — X2  + 5X3 
—4xi  + 2x2  — 10*3 


Find  a basis  for  the  range  of  T,  1Z(T).  Is  T surjective? 


( 

a 

\ 

a + b + 2c 

C26f  Let  T:  C3  -4  C3  be  given  by  T 

b 

) = 

2c 

V 

c 

/ 

a + b + c 

T surjective? 
C27f  Let  T : C3 
T surjective? 
C28f  Let  T:  C4 


C4  be  given  by  T 


M22  be  given  by  T 


a + b — c ' 
a — b + c 
— CL  “f-  b c 
a + b + c 


l 

~a 

\ 

b 

_ 

a + b 

a + b + c 

c 

- 

a + b + c 

a + d 

V 

A 

/ 

. Find  a basis  of  7 Z(T).  Is 


. Find  a basis  of  1Z(T).  Is 


. Find  a basis 


of  1Z(T).  Is  T surjective? 

C29l  Let  T:  P2  — > P4  be  given  by  T (p(x))  = x2p(x).  Find  a basis  of  1Z(T).  Is  T 
surjective? 

C3CL  Let  T : P4  — > P3  be  given  by  T (p(x))  = p'{x),  where  p'(x)  is  the  derivative.  Find  a 
basis  of  1Z(T).  Is  T surjective? 

C40'  Show  that  the  linear  transformation  T is  not  surjective  by  finding  an  element  of 
the  codomain,  v,  such  that  there  is  no  vector  u with  T (u)  = v. 


( 

a 

\ 

2a  + 3b  — c 

T:C3— HC3,  T 

b 

= 

26-  2c 

V 

c 

/ 

a — b + 2c 

M60  Suppose  U and  V are  vector  spaces.  Define  the  function  Z : U — ¥ V by  Z (u)  = Ov 
for  every  u £ U.  Then  by  Exercise  LT.M60,  Z is  a linear  transformation.  Formulate  a 
condition  on  V that  is  equivalent  to  Z being  an  surjective  linear  transformation.  In  other 
words,  fill  in  the  blank  to  complete  the  following  statement  (and  then  give  a proof):  Z is 
surjective  if  and  only  if  V is  . (See  Exercise  ILT.M60,  Exercise  IVLT.M60.) 

T15l  Suppose  that  T:  U — » V and  S : V — > W are  linear  transformations.  Prove  the 
following  relationship  between  ranges. 

IZ(SoT)  C TZ(S) 
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T2(F  Suppose  that  A is  an  m x n matrix.  Define  the  linear  transformation  T by 

T:  <Cn -HCm,  T (x)  = Ax 

Prove  that  the  range  of  T equals  the  column  space  of  A,  7 Z(T)  = C(A). 


Section  IVLT 

Invertible  Linear  Transformations 

In  this  section  we  will  conclude  our  introduction  to  linear  transformations  by  bring- 
ing together  the  twin  properties  of  injectivity  and  surjectivity  and  consider  linear 
transformations  with  both  of  these  properties. 


Subsection  IVLT 

Invertible  Linear  Transformations 


One  preliminary  definition,  and  then  we  will  have  our  main  definition  for  this  section. 


Definition  IDLT  Identity  Linear  Transformation 

The  identity  linear  transformation  on  the  vector  space  W is  defined  as 

Iw'W  -¥W,  Iw  (w)  = w 


□ 


Informally,  Iw  is  the  “do-nothing”  function.  You  should  check  that  Iw  is  really 
a linear  transformation,  as  claimed,  and  then  compute  its  kernel  and  range  to  see 
that  it  is  both  injective  and  surjective.  All  of  these  facts  should  be  straightforward 
to  verify  (Exercise  IVLT.T05).  With  this  in  hand  we  can  make  our  main  definition. 

Definition  IVLT  Invertible  Linear  Transformations 

Suppose  that  T : U — > V is  a linear  transformation.  If  there  is  a function  S : V — ► U 
such  that 

SoT  = Iv  To  S = IV 

then  T is  invertible.  In  this  case,  we  call  S the  inverse  of  T and  write  S = T-1.  □ 

Informally,  a linear  transformation  T is  invertible  if  there  is  a companion  linear 
transformation,  S,  which  “undoes”  the  action  of  T.  When  the  two  linear  transforma- 
tions are  applied  consecutively  (composition),  in  either  order,  the  result  is  to  have 
no  real  effect.  It  is  entirely  analogous  to  squaring  a positive  number  and  then  taking 
its  (positive)  square  root. 

Here  is  an  example  of  a linear  transformation  that  is  invertible.  As  usual  at  the 
beginning  of  a section,  do  not  be  concerned  with  where  S came  from,  just  understand 
how  it  illustrates  Definition  IVLT. 


Example  AIVLT  An  invertible  linear  transformation 
Archetype  V is  the  linear  transformation 


T : P3  — > M22,  T (a  + bx  + cx2  + dx 3) 


a + b a — 2c 
d b-d 
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Define  the  function  S : M22  — > P3  defined  by 


a b 
c d 


= (a  — c — d)  + (c  + d)x  + ^ (a  — b — c — d)x 2 + cx3 


Then 


(To  5) 

= T(  5 


a fo 
c d 


a b 
c d 


= T ^ (a  — c — d)  + (c  + d)x  + - (a  — b — c — d)x2  + cx3 
(a  — c — d)  + (c  + d)  (a  — c—  d)  — 2 (|(a  — b — c — d)) 


(c  + d)  — c 


= / 


a b 
c d 

M22 


and 


(S'  o T)  {a  + bx  + cx 2 + dx3) 

= S (T  (a  + bx  + cx2  + dx3)) 

_ n ( ci  + b a — 2c 
~ b d b-d 

= ((a  + b)  — d — (6  — d))  + (d  + (6  — d))x 

+ ^i((a  + b)  — (a  — 2c)  — d — (b  — d))^j  x2  + ( d)x 3 

= a + bx  + cx2  + dx3 
= Ip3  (a  + &x  + cx2  + dx3) 

For  now,  understand  why  these  computations  show  that  T is  invertible,  and  that 
S = T~b  Maybe  even  be  amazed  by  how  S works  so  perfectly  in  concert  with 
We  will  see  later  just  how  to  arrive  at  the  correct  form  of  S (when  it  is  possible) 

It  can  be  as  instructive  to  study  a linear  transformation  that  is  not  invertible. 

Example  ANILT  A non-invertible  linear  transformation 
Consider  the  linear  transformation  T : C3  — > M22  defined  by 

a — b 2a  + 2b  + c 
3 a + b + c —2a  — 6b  — 2c 


> ^ 
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Suppose  we  were  to  search  for  an  inverse  function  S : M22 

T5  3 

First  verify  that  the  2x2  matrix  A = 


amount  to  finding  an  input  to  T, 


8 2 


, such  that 


<C3. 


is  not  in  the  range  of  T.  This  will 


a — b = 5 
2 ci  A 2 b A c = 3 
3a  A b A c = 8 
—2a  — 66  — 2c  = 2 


As  this  system  of  equations  is  inconsistent,  there  is  no  input  column  vector,  and 
A ^ 1Z(T).  How  should  we  define  S (A)?  Note  that 


T (S  (A))  = (T  o S)  (A)  = IM22  (A)  = A 


So  any  definition  we  would  provide  for  S (A)  must  then  be  a column  vector  that 
T sends  to  A and  we  would  have  A € 7 Z(T),  contrary  to  the  definition  of  T.  This  is 
enough  to  see  that  there  is  no  function  S that  will  allow  us  to  conclude  that  T is 
invertible,  since  we  cannot  provide  a consistent  definition  for  S (A)  if  we  assume  T 
is  invertible. 

Even  though  we  now  know  that  T is  not  invertible,  let  us  not  leave  this  example 
just  yet.  Check  that 


or 


How  would  we  define  S {B)l 


S{B) 


(SoT) 


S(B ) 


(SoT) 


Which  definition  should  we  provide  for  S (B)l  Both  are  necessary.  But  then  S is 
not  a function.  So  we  have  a second  reason  to  know  that  there  is  no  function  S that 
will  allow  us  to  conclude  that  T is  invertible.  It  happens  that  there  are  infinitely 
many  column  vectors  that  S would  have  to  take  to  B.  Construct  the  kernel  of  T, 


K(T)  = 
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Now  choose  either  of  the  two  inputs  used  above  for  T and  add  to  it  a scalar 
multiple  of  the  basis  vector  for  the  kernel  of  T.  For  example, 


■ 1 ■ 

-r 

' 3 ' 

X = 

-2 

4 

+ (-2) 

-l 

4 

— 

0 

—4 

then  verify  that  T (x)  = B.  Practice  creating  a few  more  inputs  for  T that  would  be 
sent  to  B,  and  see  why  it  is  hopeless  to  think  that  we  could  ever  provide  a reasonable 
definition  for  S (B)!  There  is  a “whole  subspace’s  worth”  of  values  that  S ( B ) would 
have  to  take  on.  A 

In  Example  ANILT  you  may  have  noticed  that  T is  not  surjective,  since  the 
matrix  A was  not  in  the  range  of  T . And  T is  not  injective  since  there  are  two 
different  input  column  vectors  that  T sends  to  the  matrix  B.  Linear  transformations 
T that  are  not  surjective  lead  to  putative  inverse  functions  S that  are  undefined  on 
inputs  outside  of  the  range  of  T . Linear  transformations  T that  are  not  injective 
lead  to  putative  inverse  functions  S that  are  multiply-defined  on  each  of  their  inputs. 
We  will  formalize  these  ideas  in  Theorem  ILTIS. 

But  first  notice  in  Definition  IVLT  that  we  only  require  the  inverse  (when  it 
exists)  to  be  a function.  When  it  does  exist,  it  too  is  a linear  transformation. 

Theorem  ILTLT  Inverse  of  a Linear  Transformation  is  a Linear  Transformation 
Suppose  that  T : U — » V is  an  invertible  linear  transformation.  Then  the  function 
T~l : V — > U is  a linear  transformation. 

Proof.  We  work  through  verifying  Definition  LT  for  T-1,  using  the  fact  that  T is  a 
linear  transformation  to  obtain  the  second  equality  in  each  half  of  the  proof.  To  this 
end,  suppose  x,  y G V and  a £ C. 

T"1  (x  + y)  = T-1  (T  (T-1  (x))  + T ( T~l  (y)))  Definition  IVLT 

= T~ 1 (T  (T~ 1 (x)  + T~ 1 (y ) ) ) Definition  LT 

= T"1  (x)  + T~l  (y)  Definition  IVLT 

Now  check  the  second  defining  property  of  a linear  transformation  for  T-1, 

T”1  (ax)  = T"1  {olT  (T-1  (x)))  Definition  IVLT 

= T~1  (T  ( aT~l  (x)))  Definition  LT 

= aT-1  (x)  Definition  IVLT 

So  T_1  fulfills  the  requirements  of  Definition  LT  and  is  therefore  a linear  trans- 
formation. ■ 

So  when  T has  an  inverse,  T~l  is  also  a linear  transformation.  Furthermore,  T_1 
is  an  invertible  linear  transformation  and  its  inverse  is  what  you  might  expect. 
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Theorem  IILT  Inverse  of  an  Invertible  Linear  Transformation 

Suppose  that  T:  U — ► V is  an  invertible  linear  transformation.  Then  T-1  is  an 

invertible  linear  transformation  and  (T_1)  = T. 

Proof.  Because  T is  invertible,  Definition  IVLT  tells  us  there  is  a function  T-1 : V — > 
U such  that 

T-1  oT  = Ijj  ToT~1=Iv 

Additionally,  Theorem  ILTLT  tells  us  that  T_1  is  more  than  just  a function, 
it  is  a linear  transformation.  Now  view  these  two  statements  as  properties  of  the 
linear  transformation  T-1.  In  light  of  Definition  IVLT,  they  together  say  that  T-1  is 
invertible  (let  T play  the  role  of  S in  the  statement  of  the  definition).  Furthermore, 
the  inverse  of  T-1  is  then  T,  i.e.  (T_1)  1 = T.  ■ 


Subsection  IV 
Invertibility 

We  now  know  what  an  inverse  linear  transformation  is,  but  just  which  linear  trans- 
formations have  inverses?  Here  is  a theorem  we  have  been  preparing  for  all  chapter 
long. 

Theorem  ILTIS  Invertible  Linear  Transformations  are  Injective  and  Surjective 
Suppose  T : U — »•  V is  a linear  transformation.  Then  T is  invertible  if  and  only  if  T 
is  injective  and  surjective. 


Proof.  (=>)  Since  T is  presumed  invertible,  we  can  employ  its  inverse,  T 1 (Definition 
IVLT).  To  see  that  T is  injective,  suppose  x,  y £ U and  assume  that  T (x)  = T (y), 


x = hj  (x) 

= (T-1oT)  (x) 

= T”1  (T  (x)) 

= T-1(T(y)) 

= (T-1oT)  (y) 

= !u  (y) 

= y 

So  by  Definition  ILT  T is  injective. 

To  check  that  T is  surjective,  suppos 
Compute 

Tp-^v))  = (T  o T-1)  (v) 
= Iv  (v) 


Definition  IDLT 
Definition  IVLT 
Definition  LTC 
Definition  ILT 
Definition  LTC 
Definition  IVLT 
Definition  IDLT 

v £ V.  Then  T-1  (v)  is  a vector  in  U . 

Definition  LTC 
Definition  IVLT 
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= v Definition  IDLT 

So  there  is  an  element  from  U,  when  used  as  an  input  to  T (namely  T~l  (v))  that 
produces  the  desired  output,  v,  and  hence  T is  surjective  by  Definition  SLT. 

(<=)  Now  assume  that  T is  both  injective  and  surjective.  We  will  build  a function 
S : V — y U that  will  establish  that  T is  invertible.  To  this  end,  choose  any  v £ V . 
Since  T is  surjective,  Theorem  RSLT  says  1Z(T)  = V , so  we  have  v £ 7 Z(T).  Theorem 
RPI  says  that  the  pre-image  of  v,  T-1  (v),  is  nonempty.  So  we  can  choose  a vector 
from  the  pre-image  of  v,  say  u.  In  other  words,  there  exists  u £ T~l  (v). 

Since  T~l  (v)  is  nonempty,  Theorem  KPI  then  says  that 

T-1(v)  = {u  + z|z£/C(T)} 

However,  because  T is  injective,  by  Theorem  KILT  the  kernel  is  trivial,  /C(T)  = 
{0}.  So  the  pre-image  is  a set  with  just  one  element,  T~l  (v)  = {u}.  Now  we  can 
define  S'  by  S'  (v)  = u.  This  is  the  key  to  this  half  of  this  proof.  Normally  the 
preimage  of  a vector  from  the  codomain  might  be  an  empty  set,  or  an  infinite  set. 
But  surjectivity  requires  that  the  preimage  not  be  empty,  and  then  injectivity  limits 
the  preimage  to  a singleton.  Since  our  choice  of  v was  arbitrary,  we  know  that  every 
pre-image  for  T is  a set  with  a single  element.  This  allows  us  to  construct  S as  a 
function.  Now  that  it  is  defined,  verifying  that  it  is  the  inverse  of  T will  be  easy. 
Here  we  go. 

Choose  u £ U.  Define  v = T (u).  Then  T-1  (v)  = {u},  so  that  S (v)  = u and, 
(S  o T)  (u)  = S (T  (u))  =S(v)  = u = Iu  (u) 

and  since  our  choice  of  u was  arbitrary  we  have  function  equality,  S o T = Ijj. 

Now  choose  v £ V.  Define  u to  be  the  single  vector  in  the  set  T-1  (v),  in  other 
words,  u = S (v).  Then  T (u)  = v,  so 

(T  o S ) (v)  = T(S  (v))  = T (u)  = v = Iv  (v) 

and  since  our  choice  of  v was  arbitrary  we  have  function  equality,  T o S = Iy  ■ B 

When  a linear  transformation  is  both  injective  and  surjective,  the  pre-image  of 
any  element  of  the  codomain  is  a set  of  size  one  (a  “singleton”).  This  fact  allowed 
us  to  construct  the  inverse  linear  transformation  in  one  half  of  the  proof  of  Theorem 
ILTIS  (see  Proof  Technique  C)  and  is  illustrated  in  the  following  cartoon.  This 
should  remind  you  of  the  very  general  Diagram  KPI  which  was  used  to  illustrate 
Theorem  KPI  about  pre- images,  only  now  we  have  an  invertible  linear  transformation 
which  is  therefore  surjective  and  injective  (Theorem  ILTIS).  As  a surjective  linear 
transformation,  there  are  no  vectors  depicted  in  the  codomain,  V,  that  have  empty 
pre-images.  More  importantly,  as  an  injective  linear  transformation,  the  kernel  is 
trivial  (Theorem  KILT),  so  each  pre- image  is  a single  vector.  This  makes  it  possible 
to  “turn  around”  all  the  arrows  to  create  the  inverse  linear  transformation  T-1. 
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Diagram  IVLT:  Invertible  Linear  Transformation 

Many  will  call  an  injective  and  surjective  function  a bijective  function  or  just  a 
bijection.  Theorem  ILTIS  tells  us  that  this  is  just  a synonym  for  the  term  invertible 
(which  we  will  use  exclusively). 

We  can  follow  the  constructive  approach  of  the  proof  of  Theorem  ILTIS  to 
construct  the  inverse  of  a specific  linear  transformation,  as  the  next  example  shows. 

Example  CIVLT  Computing  the  Inverse  of  a Linear  Transformations 
Consider  the  linear  transformation  T : S22  — > P2  defined  by 

^ = (a  + b + c)  + (—a  + 2c)  x + (2a  + 36  + 6c)  x2 

T is  invertible,  which  you  are  able  to  verify,  perhaps  by  determining  that  the 
kernel  of  T is  trivial  and  the  range  of  T is  all  of  P2.  This  will  be  easier  once  we  have 
Theorem  RPNDD,  which  appears  later  in  this  section. 

By  Theorem  ILTIS  we  know  T~l  exists,  and  it  will  be  critical  shortly  to  realize 
that  T-1  is  automatically  known  to  be  a linear  transformation  as  well  (Theorem 
ILTLT).  To  determine  the  complete  behavior  of  T-1 : P2  —>  S22  we  can  simply 
determine  its  action  on  a basis  for  the  domain,  P2.  This  is  the  substance  of  Theorem 
LTDB,  and  an  excellent  example  of  its  application.  Choose  any  basis  of  P2,  the 
simpler  the  better,  such  as  B = { 1,  x,  £2}.  Values  of  T-1  for  these  three  basis 
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elements  will  be  the  single  elements  of  their  preimages.  In  turn,  we  have 

T~1  (1)  : 


(preimage) 

(function) 


T-1  (®)  : 


(preimage) 

(function) 


T-1  ( x 2)  : 


(preimage) 

(function) 


T 


a b 
b c 


= 1 + Qx  + Oar 


r 1 

1 

1 

1] 

ri 

0 

0 

~61 

-1 

0 

2 

0 

RREF 
> 

0 

1 

0 

10 
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3 

6 

0 

0 

0 

1 

-3 

T~1(l)  = 
T-l{  1)  = 


-6  10 
10  -3 

-6  10 
10  -3 


a b 
b c 


= 0 + lx  + Oar 


T"1  (x)  = 
T"1  (x)  = 


-3  4 

4 -1 

-3  4 ' 

4 -1 


a b 
b c 


= 0 + 0a;  + lar 


T~x  ( x 2)  = 


T~x  (x2)  = 


2 -3 

-3  1 

2 -3 

-3  1 


r 1 

1 

1 

O' 

ri 
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0 
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-1 

0 
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1 

0] 
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RREF 
> 
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1 

0 
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2 

3 

6 

1. 

Lo 

0 

1 

1 . 

Theorem  LTDB  says,  informally,  “it  is  enough  to  know  what  a linear  transfor- 
mation does  to  a basis.”  Formally,  we  have  the  outputs  of  T-1  for  a basis,  so  by 
Theorem  LTDB  there  is  a unique  linear  transformation  with  these  outputs.  So  we  put 
this  information  to  work.  The  key  step  here  is  that  we  can  convert  any  element  of  P2 
into  a linear  combination  of  the  elements  of  the  basis  B (Theorem  VRRB).  We  are 
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after  a “formula”  for  the  value  of  T 1 on  a generic  element  of  P2,  say  p + qx  + rx2. 
T~l  (p  + qx  + rx2)  = T^1  (p(l)  + q(x)  + r(x2))  Theorem  VRRB 

= pT-1  (1)  + qT-1  (x)  + rT-1  ( x 2)  Theorem  LTLC 


—6  10' 

-3  4 ' 

' 2 -3' 

V 

10  —3 

+ q 

4 -1 

+ r 

-3  1 

—6 p — 3q  + 2r  lOp  + Aq  — 3r 
10 p + 4g  — 3r  —3 p — q + r 

Notice  how  a linear  combination  in  the  domain  of  T_1  has  been  translated 
into  a linear  combination  in  the  codomain  of  T since  we  know  T-1  is  a linear 
transformation  by  Theorem  ILTLT. 

Also,  notice  how  the  augmented  matrices  used  to  determine  the  three  pre-images 
could  be  combined  into  one  calculation  of  a matrix  in  extended  echelon  form, 
reminiscent  of  a procedure  we  know  for  computing  the  inverse  of  a matrix  (see 
Example  CMI).  Hmmmm.  A 

We  will  make  frequent  use  of  the  characterization  of  invertible  linear  transforma- 
tions provided  by  Theorem  ILTIS.  The  next  theorem  is  a good  example  of  this,  and 
we  will  use  it  often,  too. 

Theorem  CIVLT  Composition  of  Invertible  Linear  Transformations 

Suppose  that  T : U V and  S : V — > W are  invertible  linear  transformations.  Then 

the  composition,  (S  o T)  : U — > W is  an  invertible  linear  transformation. 

Proof.  Since  S and  T are  both  linear  transformations,  S o T is  also  a linear  transfor- 
mation by  Theorem  CLTLT.  Since  S and  T are  both  invertible,  Theorem  ILTIS  says 
that  S and  T are  both  injective  and  surjective.  Then  Theorem  CILTI  says  S o T is 
injective,  and  Theorem  CSLTS  says  S o T is  surjective.  Now  apply  the  “other  half” 
of  Theorem  ILTIS  and  conclude  that  S o T is  invertible.  ■ 

When  a composition  is  invertible,  the  inverse  is  easy  to  construct. 

Theorem  ICLT  Inverse  of  a Composition  of  Linear  Transformations 

Suppose  that  T : U — > V and  S : V — ► W are  invertible  linear  transformations.  Then 

S oT  is  invertible  and  (5  o T)~l  = T-1  o S~1 . 

Proof.  Compute,  for  all  w £ W 

((S  o T ) o (T-1  o S -1))  (w)  = S (T  (T-1  (S-1  (w)))) 

= S [iy  (S'-1  (w)))  Definition  IVLT 

= S (S'-1  (wj)  Definition  IDLT 

= w Definition  IVLT 

= Iw  (w)  Definition  IDLT 


§1 VLT 


Beezer:  A First  Course  in  Linear  Algebra 


489 


So  (S  o T)  o (T  1 o S 1)  = Iw,  and  also 
((T-1  o 5"1)  o (S  o T))  (u)  = T"1  (S-1  (S  ( T (u)))) 

= T-1(/y(T(u))) 

= T-1(T(u)) 


= Iu  (u) 

so  (r'oS-1)  o(SoT)  = Iv. 

By  Definition  IVLT,  S o T is  invertible  and  (S'oT)"1 


Definition  IVLT 
Definition  IDLT 
Definition  IVLT 
Definition  IDLT 


Notice  that  this  theorem  not  only  establishes  what  the  inverse  of  S o T is,  it  also 
duplicates  the  conclusion  of  Theorem  CIVLT  and  also  establishes  the  invertibility  of 
S o T.  But  somehow,  the  proof  of  Theorem  CIVLT  is  a nicer  way  to  get  this  property. 

Does  Theorem  ICLT  remind  you  of  the  flavor  of  any  theorem  we  have  seen  about 
matrices?  (Hint:  Think  about  getting  dressed.)  Hmmmm. 


Subsection  SI 

Structure  and  Isomorphism 

A vector  space  is  defined  (Definition  VS)  as  a set  of  objects  (“vectors”)  endowed 
with  a definition  of  vector  addition  (+)  and  a definition  of  scalar  multiplication 
(written  with  juxtaposition).  Many  of  our  definitions  about  vector  spaces  involve 
linear  combinations  (Definition  LC),  such  as  the  span  of  a set  (Definition  SS)  and 
linear  independence  (Definition  LI).  Other  definitions  are  built  up  from  these  ideas, 
such  as  bases  (Definition  B)  and  dimension  (Definition  D).  The  defining  properties 
of  a linear  transformation  require  that  a function  “respect”  the  operations  of  the 
two  vector  spaces  that  are  the  domain  and  the  codomain  (Definition  LT).  Finally,  an 
invertible  linear  transformation  is  one  that  can  be  “undone”  — it  has  a companion 
that  reverses  its  effect.  In  this  subsection  we  are  going  to  begin  to  roll  all  these  ideas 
into  one. 

A vector  space  has  “structure”  derived  from  definitions  of  the  two  operations 
and  the  requirement  that  these  operations  interact  in  ways  that  satisfy  the  ten 
properties  of  Definition  VS.  When  two  different  vector  spaces  have  an  invertible 
linear  transformation  defined  between  them,  then  we  can  translate  questions  about 
linear  combinations  (spans,  linear  independence,  bases,  dimension)  from  the  first 
vector  space  to  the  second.  The  answers  obtained  in  the  second  vector  space  can 
then  be  translated  back,  via  the  inverse  linear  transformation,  and  interpreted  in  the 
setting  of  the  first  vector  space.  We  say  that  these  invertible  linear  transformations 
“preserve  structure.”  And  we  say  that  the  two  vector  spaces  are  “structurally  the 
same.”  The  precise  term  is  “isomorphic,”  from  Greek  meaning  “of  the  same  form.” 
Let  us  begin  to  try  to  understand  this  important  concept. 


§1 VLT 


Beezer:  A First  Course  in  Linear  Algebra 


490 


Definition  IVS  Isomorphic  Vector  Spaces 

Two  vector  spaces  U and  V are  isomorphic  if  there  exists  an  invertible  linear 
transformation  T with  domain  U and  codomain  V,  T:  U —¥  V.  In  this  case,  we  write 
U =V,  and  the  linear  transformation  T is  known  as  an  isomorphism  between  U 
and  V.  □ 


A few  comments  on  this  definition.  First,  be  careful  with  your  language  (Proof 
Technique  L).  Two  vector  spaces  are  isomorphic,  or  not.  It  is  a yes/no  situation  and 
the  term  only  applies  to  a pair  of  vector  spaces.  Any  invertible  linear  transformation 
can  be  called  an  isomorphism,  it  is  a term  that  applies  to  functions.  Second,  given 
a pair  of  vector  spaces  there  might  be  several  different  isomorphisms  between  the 
two  vector  spaces.  But  it  only  takes  the  existence  of  one  to  call  the  pair  isomorphic. 
Third,  U isomorphic  to  V,  or  V isomorphic  to  U?  It  does  not  matter,  since  the 
inverse  linear  transformation  will  provide  the  needed  isomorphism  in  the  “opposite” 
direction.  Being  “isomorphic  to”  is  an  equivalence  relation  on  the  set  of  all  vector 
spaces  (see  Theorem  SER  for  a reminder  about  equivalence  relations). 


Example  IVSAV  Isomorphic  vector  spaces,  Archetype  V 
Archetype  V is  a linear  transformation  from  P3  to  M22, 

a + b a — 2c 
d b~d 

Since  it  is  injective  and  surjective,  Theorem  ILTIS  tells  us  that  it  is  an  invertible 
linear  transformation.  By  Definition  IVS  we  say  P3  and  M22  are  isomorphic. 

At  a basic  level,  the  term  “isomorphic”  is  nothing  more  than  a codeword  for  the 
presence  of  an  invertible  linear  transformation.  However,  it  is  also  a description  of 
a powerful  idea,  and  this  power  only  becomes  apparent  in  the  course  of  studying 
examples  and  related  theorems.  In  this  example,  we  are  led  to  believe  that  there  is 
nothing  “structurally”  different  about  P3  and  M22-  In  a certain  sense  they  are  the 
same.  Not  equal,  but  the  same.  One  is  as  good  as  the  other.  One  is  just  as  interesting 
as  the  other. 

Here  is  an  extremely  basic  application  of  this  idea.  Suppose  we  want  to  compute 
the  following  linear  combination  of  polynomials  in  P3, 

5(2  + 3a;  - 4x2  + 5x3)  + ( — 3) (3  - 5x  + 3x2  + x3) 


T : P3  — )•  M22,  T (a  + bx  + cx 2 + dx3)  = 


Rather  than  doing  it  straight-away  (which  is  very  easy),  we  will  apply  the 
transformation  T to  convert  into  a linear  combination  of  matrices,  and  then  compute 
in  M22  according  to  the  definitions  of  the  vector  space  operations  there  (Example 


VSM), 

T (5(2  + 3x  - 4x2  + 5x3)  + (-3) (3  - 5a:  + 3a;2  + a;3)) 

= 5T  (2  + 3a:  - 4a;2  + 5a;3)  + (-3)T  (3  - 5a;  + 3a;2  + a;3)  Theorem  LTLC 


Definition  of  T 
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31  59 
22  8 


Operations  in  M22 


Now  we  will  translate  our  answer  back  to  P3  by  applying  T 1 . which  we  demon- 
strated in  Example  AIVLT, 


t-1-.m22->p^  t~ 

We  compute, 

T~ 


a b 
c d 


= (a  — c — d)  + (c  + d)x  + ^ (a  — b — c — d)x2  + cx3 


31  59 
22  8 


= 1 + 30x  - 29a;2  + 22a;3 


which  is,  as  expected,  exactly  what  we  would  have  computed  for  the  original  linear 
combination  had  we  just  used  the  definitions  of  the  operations  in  P3  (Example  VSP). 
Notice  this  is  meant  only  as  an  illustration  and  not  a suggested  route  for  doing  this 
particular  computation.  A 


In  Example  IVSAV  we  avoided  a computation  in  P3  by  a conversion  of  the 
computation  to  a new  vector  space,  M22,  via  an  invertible  linear  transformation  (also 
known  as  an  isomorphism).  Here  is  a diagram  meant  to  illustrate  the  more  general 
situation  of  two  vector  spaces,  U and  V,  and  an  invertible  linear  transformation, 
T.  The  diagram  is  simply  about  a sum  of  two  vectors  from  U,  rather  than  a more 
involved  linear  combination.  It  should  remind  you  of  Diagram  DLTA. 


Up  u2 


+ 


Ui  + Uo  <- 


T 


T( ui),  T{ u2) 


+ 


T 


-1 


T(ui  + u2)  = T(Ul)  + T(u2) 


Diagram  AIVS:  Addition  in  Isomorphic  Vector  Spaces 

To  understand  this  diagram,  begin  in  the  upper-left  corner,  and  by  going  straight 
down  we  can  compute  the  sum  of  the  two  vectors  using  the  addition  for  the  vector 
space  U . The  more  circuitous  alternative,  in  the  spirit  of  Example  IVSAV,  is  to 
begin  in  the  upper-left  corner  and  then  proceed  clockwise  around  the  other  three 
sides  of  the  rectangle.  Notice  that  the  vector  addition  is  accomplished  using  the 
addition  in  the  vector  space  V . Then,  because  T is  a linear  transformation,  we  can 
say  that  the  result  of  T (ui)  + T (u2)  is  equal  to  T (ui  + u2).  Then  the  key  feature 
is  to  recognize  that  applying  T-1  obviously  converts  the  second  version  of  this  result 
into  the  sum  in  the  lower-left  corner.  So  there  are  two  routes  to  the  sum  Ui  + u2, 
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each  employing  an  addition  from  a different  vector  space,  but  one  is  “direct”  and 
the  other  is  “roundabout” . You  might  try  designing  a similar  diagram  for  the  case 
of  scalar  multiplication  (see  Diagram  DLTM)  or  for  a full  linear  combination. 

Checking  the  dimensions  of  two  vector  spaces  can  be  a quick  way  to  establish 
that  they  are  not  isomorphic.  Here  is  the  theorem. 

Theorem  IVSED  Isomorphic  Vector  Spaces  have  Equal  Dimension 
Suppose  U and  V are  isomorphic  vector  spaces.  Then  dim  (17)  = dirn(V). 

Proof.  If  U and  V are  isomorphic,  there  is  an  invertible  linear  transformation 
T : U —¥  V (Definition  IVS).  T is  injective  by  Theorem  ILTIS  and  so  by  Theorem 
ILTD,  dim(t/)  < dim(V).  Similarly,  T is  surjective  by  Theorem  ILTIS  and  so  by 
Theorem  SLTD,  dim(/7)  > dim(V).  The  net  effect  of  these  two  inequalities  is  that 
dim  (U)  = dim  ( V ).  ■ 

The  contrapositive  of  Theorem  IVSED  says  that  if  U and  V have  different 
dimensions,  then  they  are  not  isomorphic.  Dimension  is  the  simplest  “structural” 
characteristic  that  will  allow  you  to  distinguish  non-isomorphic  vector  spaces.  For 
example  P6  is  not  isomorphic  to  M34  since  their  dimensions  (7  and  12,  respectively) 
are  not  equal.  With  tools  developed  in  Section  VR  we  will  be  able  to  establish  that 
the  converse  of  Theorem  IVSED  is  true.  Think  about  that  one  for  a moment. 


Subsection  RNLT 

Rank  and  Nullity  of  a Linear  Transformation 


Just  as  a matrix  has  a rank  and  a nullity,  so  too  do  linear  transformations.  And 
just  like  the  rank  and  nullity  of  a matrix  are  related  (they  sum  to  the  number  of 
columns,  Theorem  RPNC)  the  rank  and  nullity  of  a linear  transformation  are  related. 
Here  are  the  definitions  and  theorems,  see  the  Archetypes  (Archetypes)  for  loads  of 
examples. 


Definition  ROLT  Rank  Of  a Linear  Transformation 

Suppose  that  T:  U —*■  V is  a linear  transformation.  Then  the  rank  of  T,  r ( T ),  is 
the  dimension  of  the  range  of  T, 

r(T)  = dim  (ft  (T)) 


□ 


Definition  NOLT  Nullity  Of  a Linear  Transformation 

Suppose  that  T : U — i V is  a linear  transformation.  Then  the  nullity  of  T,  n (T),  is 
the  dimension  of  the  kernel  of  T, 

n{T)  = dim  (/C(T)) 


□ 
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Here  are  two  quick  theorems. 

Theorem  ROSLT  Rank  Of  a Surjective  Linear  Transformation 

Suppose  that  T : U — > V is  a linear  transformation.  Then  the  rank  of  T is  the 

dimension  ofV,  r (T)  = dim(F),  if  and  only  if  T is  surjective. 

Proof.  By  Theorem  RSLT,  T is  surjective  if  and  only  if  1Z(T)  = V.  Applying 
Definition  ROLT,  1Z(T)  = V if  and  only  if  r (T)  = dim  ( 1Z(T ))  = dim  (V).  ■ 

Theorem  NOILT  Nullity  Of  an  Injective  Linear  Transformation 

Suppose  that  T:  U -A  V is  a linear  transformation.  Then  the  nullity  of  T is  zero, 

n (T)  = 0,  if  and  only  if  T is  injective. 

Proof.  By  Theorem  KILT,  T is  injective  if  and  only  if  1C(T)  = {0}.  Applying 
Definition  NOLT,  K{T)  = {0}  if  and  only  if  n (T)  =0.  ■ 

Just  as  injectivity  and  surjectivity  come  together  in  invertible  linear  transforma- 
tions, there  is  a clear  relationship  between  rank  and  nullity  of  a linear  transformation. 
If  one  is  big,  the  other  is  small. 

Theorem  RPNDD  Rank  Plus  Nullity  is  Domain  Dimension 
Suppose  that  T : U V is  a linear  transformation.  Then 

r ( T ) + n (T)  = dim  (U) 

Proof.  Let  r = r ( T ) and  s = n (T).  Suppose  that  R = {v1?  v2,  v3,  . . . , v,,}  C V is 
a basis  of  the  range  of  T,  1Z(T),  and  S = {ui,  u2,  u3,  . . . , us}  C U is  a basis  of  the 
kernel  of  T,  JC(T).  Note  that  R and  S are  possibly  empty,  which  means  that  some 
of  the  sums  in  this  proof  are  “empty”  and  are  equal  to  the  zero  vector. 

Because  the  elements  of  R are  all  in  the  range  of  T,  each  must  have  a nonempty  pre- 
image by  Theorem  RPI.  Choose  vectors  w,  £ U,  1 < i < r such  that  w,;  £ T-1  (vj). 
So  T (wj)  = Vj,  1 < * < r.  Consider  the  set 

B = {ui,  u2,  u3 , ... , us,  wi,  w2,  w3 , . . . , w,.} 

We  claim  that  B is  a basis  for  U. 

To  establish  linear  independence  for  B , begin  with  a relation  of  linear  dependence 
on  B.  So  suppose  there  are  scalars  a±,  a2,  a3,  . . . , as  and  hi,  &2,  63,  . . . , hr 

0 = aiUi  + a2u2  + a3u3  -| + asus  + 6iWi  + &2w2  + 63w3  + • • • + br wr 

Then 

0 = T (0)  Theorem  LTTZZ 

= T (aiUi  + a2u2  + a3u3  + •••-)-  asus+ 

&lWi  + 62w2  + 63w3  -| + hr wr) 

= afT  (uj)  + afT  (u2)  + a3T  (u3)  + • • • + asT  (us)  + 


Definition  LI 
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b\T  (wi)  + b3T  (W2)  + b3T  (w3)  + • • • + brT  (wy)  Theorem  LTLC 
= GiO  4-  @2®  4-  U3O  T * * * T (2s0-b 

b\T  (wi)  + b2T  (W2)  + b3T  (W3)  + • • • + brT  (wr)  Definition  KLT 
=0+0+0+-+0+ 


b\T  (wi)  + 62T  (W2)  + b3T  (W3)  + • • • + brT  (wr)  Theorem  ZVSM 
= b\T  (wx)  + b3T  (w2)  + b3T  (w3)  + • ■ ■ + brT  (wr)  Property  Z 
= 61  Vi  + 62 v2  + 63 v3  + • • • + brvr  Definition  PI 

This  is  a relation  of  linear  dependence  on  R (Definition  RLD),  and  since  R is  a 
linearly  independent  set  (Definition  LI),  we  see  that  b±  = 62  = b3  = ■ ■ • = br  = 0. 
Then  the  original  relation  of  linear  dependence  on  B becomes 


0 = aiui  + a2u2  + a3u3  4 b asus  + Owi  + 0w2  + . . . + 0wr 

= aiUi  + CI2U2  + CI3U3  4 — • + asus  + 0 + 0 + . . . + 0 Theorem  ZSSM 

= aiui  + a2u2  + a3u3  + • • • + asus  Property  Z 

But  this  is  again  a relation  of  linear  independence  (Definition  RLD),  now  on  the 
set  S.  Since  S is  linearly  independent  (Definition  LI),  we  have  <21  = <22  = <23  = . . . = 
ar  = 0.  Since  we  now  know  that  all  the  scalars  in  the  relation  of  linear  dependence  on 
B must  be  zero,  we  have  established  the  linear  independence  of  S through  Definition 
LI. 

To  now  establish  that  B spans  {7,  choose  an  arbitrary  vector  u £ U.  Then 
T (u)  £ R(T),  so  there  are  scalars  Ci,  C2,  C3,  . . . , cr  such  that 

T (u)  = C1V1  + c2v2  + C3V3  H b crvr 

Use  the  scalars  Ci,  C2,  c3,  . . . , cr  to  define  a vector  y £ U, 
y = C1W1  + c2w2  + C3W3  H + cr wr 

Then 

T(u-y)  = T(u)-T(y) 

= T (u)  - T (ciwi  + c2w2  + C3W3  4 b crwr) 

= T (u)  - (ciT  (wi)  4-  c2T  (w2)  4 b crT  (wr)) 

= T (u)  - (civi  + c2v2  4-  C3V3  4 b crvr) 

= T(u)-T(u) 

= 0 

So  the  vector  u — y is  sent  to  the  zero  vector  by  T and  hence  is  an  element  of  the 
kernel  of  T.  As  such  it  can  be  written  as  a linear  combination  of  the  basis  vectors 
for  JC(T),  the  elements  of  the  set  S.  So  there  are  scalars  d\ . d2,  d3,  . . . , ds  such  that 

u - y = diui  4-  d2u2  + d3 u3  4 b dsu3 


Theorem  LTLC 
Substitution 
Theorem  LTLC 
w,  £ T~x  (vj) 
Substitution 
Property  AI 
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Then 

u=  (u-y)  +y 

= diUi  + d2u2  + d3u3  + • • • + ds  us  + C1W1  + c2w2  + c3w3  + • • • + crwr 

This  says  that  for  any  vector,  u,  from  U,  there  exist  scalars  {d\.  ci2,  d3,  . . . , ds, 
Ci,  c2,  c3,  . . . , cr)  that  form  u as  a linear  combination  of  the  vectors  in  the  set  B. 
In  other  words,  B spans  U (Definition  SS). 

So  B is  a basis  (Definition  B)  of  U with  s + r vectors,  and  thus 

dim  ( U ) = s + r = n (T)  + r (T) 

as  desired.  ■ 

Theorem  RPNC  said  that  the  rank  and  nullity  of  a matrix  sum  to  the  number  of 
columns  of  the  matrix.  This  result  is  now  an  easy  consequence  of  Theorem  RPNDD 
when  we  consider  the  linear  transformation  T : Cn  — > Cm  defined  with  the  m x n 
matrix  A by  T (x)  = Ax.  The  range  and  kernel  of  T are  identical  to  the  column 
space  and  null  space  of  the  matrix  A (Exercise  ILT.T20,  Exercise  SLT.T20),  so  the 
rank  and  nullity  of  the  matrix  A are  identical  to  the  rank  and  nullity  of  the  linear 
transformation  T.  The  dimension  of  the  domain  of  T is  the  dimension  of  Cn,  exactly 
the  number  of  columns  for  the  matrix  A. 

This  theorem  can  be  especially  useful  in  determining  basic  properties  of  linear 
transformations.  For  example,  suppose  that  T : C6  — > C6  is  a linear  transformation 
and  you  are  able  to  quickly  establish  that  the  kernel  is  trivial.  Then  n ( T ) = 0.  First 
this  means  that  T is  injective  by  Theorem  NOILT.  Also,  Theorem  RPNDD  becomes 

6 = dim  (C6)  = r (T)  + n (T)  = r (T)  + 0 = r (T) 

So  the  rank  of  T is  equal  to  the  dimension  of  the  codomain,  and  by  Theorem  ROSLT 
we  know  T is  surjective.  Finally,  we  know  T is  invertible  by  Theorem  ILTIS.  So  from 
the  determination  that  the  kernel  is  trivial,  and  consideration  of  various  dimensions, 
the  theorems  of  this  section  allow  us  to  conclude  the  existence  of  an  inverse  linear 
transformation  for  T . Similarly,  Theorem  RPNDD  can  be  used  to  provide  alternative 
proofs  for  Theorem  ILTD,  Theorem  SLTD  and  Theorem  IVSED.  It  would  be  an 
interesting  exercise  to  construct  these  proofs. 

It  would  be  instructive  to  study  the  archetypes  that  are  linear  transformations 
and  see  how  many  of  their  properties  can  be  deduced  just  from  considering  only  the 
dimensions  of  the  domain  and  codomain.  Then  add  in  just  knowledge  of  either  the 
nullity  or  rank,  and  see  how  much  more  you  can  learn  about  the  linear  transformation. 
The  table  preceding  all  of  the  archetypes  (Archetypes)  could  be  a good  place  to 
start  this  analysis. 
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Subsection  SLELT 

Systems  of  Linear  Equations  and  Linear  Transformations 


This  subsection  does  not  really  belong  in  this  section,  or  any  other  section,  for  that 
matter.  It  is  just  the  right  time  to  have  a discussion  about  the  connections  between 
the  central  topic  of  linear  algebra,  linear  transformations,  and  our  motivating  topic 
from  Chapter  SLE,  systems  of  linear  equations.  We  will  discuss  several  theorems  we 
have  seen  already,  but  we  will  also  make  some  forward-looking  statements  that  will 
be  justified  in  Chapter  R. 

Archetype  D and  Archetype  E are  ideal  examples  to  illustrate  connections  with 
linear  transformations.  Both  have  the  same  coefficient  matrix, 


D = 


' 2 1 
-3  4 
1 1 


7 

-5 

4 


-T 

-6 

-5 


To  apply  the  theory  of  linear  transformations  to  these  two  archetypes,  employ 
the  matrix- vector  product  (Definition  MVP)  and  define  the  linear  transformation, 


T:  C4 


C3,  T (x)  = Px  = a.'i 


' 2 ‘ 
-3 

+ x2 

T 

4 

+ %3 

- 7 ' 

-5 

+ £4 

--T 

-6 

1 

1 

4 

-5 

Theorem  MBLT  tells  us  that  T is  indeed  a linear  transformation.  Archetype 


D asks  for  solutions  to  CS(D,  b),  where  b = 


-12 

-4 


. In  the  language  of  linear 


transformations  this  is  equivalent  to  asking  for  T 1 (b).  In  the  language  of  vectors 
and  matrices  it  asks  for  a linear  combination  of  the  four  columns  of  D that  will 


equal  b.  One  solution  listed  is  w = 


. With  a nonempty  preimage,  Theorem  KPI 


tells  us  that  the  complete  solution  set  of  the  linear  system  is  the  preimage  of  b, 

w + K.(T)  = { w + z|  z € /C(T)} 


The  kernel  of  the  linear  transformation  T is  exactly  the  null  space  of  the  matrix 
D (see  Exercise  ILT.T20),  so  this  approach  to  the  solution  set  should  be  reminiscent 
of  Theorem  PSPHS.  The  kernel  of  the  linear  transformation  is  the  preimage  of  the 
zero  vector,  exactly  equal  to  the  solution  set  of  the  homogeneous  system  CS(D , 0). 
Since  D has  a null  space  of  dimension  two,  every  preimage  (and  in  particular  the 
preimage  of  b)  is  as  “big”  as  a subspace  of  dimension  two  (but  is  not  a subspace). 

Archetype  E is  identical  to  Archetype  D but  with  a different  vector  of  constants, 

T2] 

3 
2 


d = 


. We  can  use  the  same  linear  transformation  T to  discuss  this  system 
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of  equations  since  the  coefficient  matrix  is  identical.  Now  the  set  of  solutions  to 
£<S(D,  d)  is  the  pre-image  of  d,  T-1  (d).  However,  the  vector  d is  not  in  the  range 
of  the  linear  transformation  (nor  is  it  in  the  column  space  of  the  matrix,  since  these 
two  sets  are  equal  by  Exercise  SLT.T20).  So  the  empty  pre- image  is  equivalent  to 
the  inconsistency  of  the  linear  system. 

These  two  archetypes  each  have  three  equations  in  four  variables,  so  either  the 
resulting  linear  systems  are  inconsistent,  or  they  are  consistent  and  application  of 
Theorem  CMVEI  tells  us  that  the  system  has  infinitely  many  solutions.  Considering 
these  same  parameters  for  the  linear  transformation,  the  dimension  of  the  domain, 
C4,  is  four,  while  the  codomain,  C3,  has  dimension  three.  Then 

n (T)  = dim  (C4)  - r (T)  Theorem  RPNDD 

= 4 — dim  (1Z(T))  Definition  ROLT 

>4—3  7 Z(T)  subspace  of  C3 

= 1 

So  the  kernel  of  T is  nontrivial  simply  by  considering  the  dimensions  of  the 
domain  (number  of  variables)  and  the  codomain  (number  of  equations).  Pre-images 
of  elements  of  the  codomain  that  are  not  in  the  range  of  T are  empty  (inconsistent 
systems).  For  elements  of  the  codomain  that  are  in  the  range  of  T (consistent 
systems),  Theorem  KPI  tells  us  that  the  pre-images  are  built  from  the  kernel,  and 
with  a nontrivial  kernel,  these  pre- images  are  infinite  (infinitely  many  solutions). 

When  do  systems  of  equations  have  unique  solutions?  Consider  the  system  of 
linear  equations  CS(C , f)  and  the  linear  transformation  S (x)  = Cx.  If  S has  a 
trivial  kernel,  then  pre-images  will  either  be  empty  or  be  finite  sets  with  single 
elements.  Correspondingly,  the  coefficient  matrix  C will  have  a trivial  null  space 
and  solution  sets  will  either  be  empty  (inconsistent)  or  contain  a single  solution 
(unique  solution).  Should  the  matrix  be  square  and  have  a trivial  null  space  then 
we  recognize  the  matrix  as  being  nonsingular.  A square  matrix  means  that  the 
corresponding  linear  transformation,  T,  has  equal-sized  domain  and  codomain.  With 
a nullity  of  zero,  T is  injective,  and  also  Theorem  RPNDD  tells  us  that  rank  of  T is 
equal  to  the  dimension  of  the  domain,  which  in  turn  is  equal  to  the  dimension  of 
the  codomain.  In  other  words,  T is  surjective.  Injective  and  surjective,  and  Theorem 
ILTIS  tells  us  that  T is  invertible.  Just  as  we  can  use  the  inverse  of  the  coefficient 
matrix  to  find  the  unique  solution  of  any  linear  system  with  a nonsingular  coefficient 
matrix  (Theorem  SNCM),  we  can  use  the  inverse  of  the  linear  transformation  to 
construct  the  unique  element  of  any  pre- image  (proof  of  Theorem  ILTIS). 

The  executive  summary  of  this  discussion  is  that  to  every  coefficient  matrix 
of  a system  of  linear  equations  we  can  associate  a natural  linear  transformation. 
Solution  sets  for  systems  with  this  coefficient  matrix  are  preimages  of  elements  of  the 
codomain  of  the  linear  transformation.  For  every  theorem  about  systems  of  linear 
equations  there  is  an  analogue  about  linear  transformations.  The  theory  of  linear 
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transformations  provides  all  the  tools  to  recreate  the  theory  of  solutions  to  linear 
systems  of  equations. 

We  will  continue  this  adventure  in  Chapter  R. 

Reading  Questions 

1.  What  conditions  allow  us  to  easily  determine  if  a linear  transformation  is  invertible? 

2.  What  does  it  mean  to  say  two  vector  spaces  are  isomorphic?  Both  technically,  and 
informally? 

3.  How  do  linear  transformations  relate  to  systems  of  linear  equations? 


Exercises 

CIO  The  archetypes  below  are  linear  transformations  of  the  form  T:  U -A  V that  are 
invertible.  For  each,  the  inverse  linear  transformation  is  given  explicitly  as  part  of  the 
archetype’s  description.  Verify  for  each  linear  transformation  that 

T~1oT  = Iu  T o T-1  = Iv 


Archetype  R,  Archetype  V,  Archetype  W 

C20'  Determine  if  the  linear  transformation  T : P2  — > M22  is  (a)  injective,  (b)  surjective, 
(c)  invertible. 


T (a  + bx  + cx 2) 


a + 2b  — 2c  2a  + 2b 
—a  + b — 4c  3a  + 2b  + 2c 


C21'  Determine  if  the  linear  transformation  S : P3  — > M22  is  (a)  injective,  (b)  surjective, 
(c)  invertible. 


S (a  + bx  + cx 2 + dx 3) 


—a  + 4b  + c + 2d  4a  — b + 6c  — d 
a + 5b  — 2c  + 2d  a + 2c  + 5d 


C25  For  each  linear  transformation  below:  (a)  Find  the  matrix  representation  of  T,  (b) 
Calculate  n(T),  (c)  Calculate  r(T),  (d)  Graph  the  image  in  either  R2  or  R3  as  appropriate, 
(e)  How  many  dimensions  are  lost?,  and  (f)  How  many  dimensions  are  preserved? 


1.  T:  C3  ->  C3  given  by  T 


x 

x 

X 


2.  T:  C3  ->■  C3  given  by  T 


x 

V 

0 


x 

y 

z 


3.  T:  C3  — > C2  given  by  T 
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4.  T:  C3  -A  C2  given  by  T 


x 

V 

0 


x + y_ 

C50'  Consider  the  linear  transformation  S\  —¥  Pi  from  the  set  of  1 x 2 matrices  to 
the  set  of  polynomials  of  degree  at  most  1,  defined  by 

S ( [a  6] ) = (3a  + b)  + (5a  + 2b)x 

Prove  that  S is  invertible.  Then  show  that  the  linear  transformation 

R\  Pi  — > M12,  R (f  + sx)  = [(2r  — s)  (— 5r  + 3s)] 

is  the  inverse  of  S,  that  is  S^1  = R. 

M3(F  The  linear  transformation  S below  is  invertible.  Find  a formula  for  the  inverse 
linear  transformation,  5_1. 

S:  Pi  — > M12,  S (a  + bx)  = [3a  + 6 2a  + b] 

M31'  The  linear  transformation  R : M12  — > M21  is  invertible.  Determine  a formula  for 
the  inverse  linear  transformation  R ~ : M21  — > M12. 

a + 3b 

4 a + 116 

M50  Rework  Example  CIVLT,  only  in  place  of  the  basis  B for  P2,  choose  instead  to  use 
the  basis  C={l,  1 + x,  l + x + a;2}.  This  will  complicate  writing  a generic  element  of  the 
domain  of  T_1  as  a linear  combination  of  the  basis  elements,  and  the  algebra  will  be  a bit 
messier,  but  in  the  end  you  should  obtain  the  same  formula  for  T-1.  The  inverse  linear 
transformation  is  what  it  is,  and  the  choice  of  a particular  basis  should  not  influence  the 
outcome. 

M60  Suppose  U and  V are  vector  spaces.  Define  the  function  Z : U — ► V by  Z (u)  = Ov 
for  every  u £ U.  Then  by  Exercise  LT.M60,  Z is  a linear  transformation.  Formulate  a 
condition  on  U and  V that  is  equivalent  to  Z being  an  invertible  linear  transformation.  In 
other  words,  fill  in  the  blank  to  complete  the  following  statement  (and  then  give  a proof): 
Z is  invertible  if  and  only  if  U and  V are  . (See  Exercise  ILT.M60,  Exercise  SLT.M60, 
Exercise  MR.M60.) 

T05  Prove  that  the  identity  linear  transformation  (Definition  IDLT)  is  both  injective  and 
surjective,  and  hence  invertible. 

T15^  Suppose  that  T : U — > V is  a surjective  linear  transformation  and  dim  (U)  = dim  (V). 
Prove  that  T is  injective. 


I?  ([a  &])  = 


6.  T:  C2  — MC3  given  by  T 


5.  T:  C2  — MC3  given  by  T 
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T16  Suppose  that  T : U — > V is  an  injective  linear  transformation  and  dim  (U)  = dim  (V). 
Prove  that  T is  surjective. 

T3(F  Suppose  that  U and  V are  isomorphic  vector  spaces,  not  of  dimension  zero.  Prove 
that  there  are  infinitely  many  isomorphisms  between  U and  V. 

T40'  Suppose  T:  U —¥  V and  S : V — > W are  linear  transformations  and  dim  (U)  = 
dim  ( V ) = dim  ( W ).  Suppose  that  S o T is  invertible.  Prove  that  S and  T are  individually 
invertible  (this  could  be  construed  as  a converse  of  Theorem  CIVLT). 


Chapter  R 
Representations 


Previous  work  with  linear  transformations  may  have  convinced  you  that  we  can 
convert  most  questions  about  linear  transformations  into  questions  about  systems  of 
equations  or  properties  of  subspaces  of  Cm.  In  this  section  we  begin  to  make  these 
vague  notions  precise.  We  have  used  the  word  “representation”  prior,  but  it  will  get 
a heavy  workout  in  this  chapter.  In  many  ways,  everything  we  have  studied  so  far 
was  in  preparation  for  this  chapter. 


Section  VR 

Vector  Representations 

You  may  have  noticed  that  many  questions  about  elements  of  abstract  vector  spaces 
eventually  become  questions  about  column  vectors  or  systems  of  equations.  Example 
SM32  would  be  an  example  of  this.  We  will  make  this  vague  idea  more  precise  in 
this  section. 

Subsection  VR 
Vector  Representation 

We  begin  by  establishing  an  invertible  linear  transformation  between  any  vector 
space  V of  dimension  n and  Cn.  This  will  allow  us  to  “go  back  and  forth”  between 
the  two  vector  spaces,  no  matter  how  abstract  the  definition  of  V might  be. 

Definition  VR  Vector  Representation 

Suppose  that  V is  a vector  space  with  a basis  B = {vi,  v2,  v3,  . . . , v„ } . Define  a 
function  pB'-  V —>  Cn  as  follows.  For  w £ V define  the  column  vector  pB  (w)  £ C™ 
by 

w = [pB  (w)]:  Vi  + [pB  (w)]2  v2  + [pB  (w)]3  v3  H \-[pB  (w)]n  vn 
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This  definition  looks  more  complicated  that  it  really  is,  though  the  form  above  will 
be  useful  in  proofs.  Simply  stated,  given  w £ V,  we  write  w as  a linear  combination 
of  the  basis  elements  of  B.  It  is  key  to  realize  that  Theorem  VRRB  guarantees  that 
we  can  do  this  for  every  w,  and  furthermore  this  expression  as  a linear  combination  is 
unique.  The  resulting  scalars  are  just  the  entries  of  the  vector  ps  (w).  This  discussion 
should  convince  you  that  ps  is  “well-defined”  as  a function.  We  can  determine  a 
precise  output  for  any  input.  Now  we  want  to  establish  that  pb  is  a function  with 
additional  properties  — it  is  a linear  transformation. 

Theorem  VRLT  Vector  Representation  is  a Linear  Transformation 
The  function  ps  (Definition  VR)  is  a linear  transformation. 

Proof.  We  will  take  a novel  approach  in  this  proof.  We  will  construct  another  function, 
which  we  will  easily  determine  is  a linear  transformation,  and  then  show  that  this 
second  function  is  really  ps  in  disguise.  Here  we  go. 

Since  I?  is  a basis,  we  can  define  T : V — > Cn  to  be  the  unique  linear  transformation 
such  that  T (vt)  = e,,  1 < i < n,  as  guaranteed  by  Theorem  LTDB,  and  where  the 
e.;  are  the  standard  unit  vectors  (Definition  SUV).  Then  suppose  for  an  arbitrary 
w e V we  have, 


Definition  VR 


Theorem  LTLC 


n 


y \pb  (w)]j  et 


n 


y,  | [ps  (w)]j  ej  Definition  CVA 


n 


y [ pb  (w)]j  [ej\i  Definition  CVSM 


n 


[ Pb  (w)]i  [e^  + y \pB  (w)]^  [e,-] . Property  CC 


i=i 


n 


L Pb  (w)]*  (1)  + y [pB  (w)]^.  (0) 


i= i 


Definition  SUV 
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= [PB  (w)]i 

As  column  vectors,  Definition  CVE  implies  that  T (w)  = pb  (w).  Since  w was  an 
arbitrary  element  of  V,  as  functions  T = ps-  Now,  since  T is  known  to  be  a linear 
transformation,  it  must  follow  that  pb  is  also  a linear  transformation.  ■ 


The  proof  of  Theorem  VRLT  provides  an  alternate  definition  of  vector  represen- 
tation relative  to  a basis  B that  we  could  state  as  a corollary  (Proof  Technique  LC): 
Pb  is  the  unique  linear  transformation  that  takes  B to  the  standard  unit  basis. 


Example  VRC4  Vector  representation  in  C4 
Consider  the  vector  y S C4 


y = 


' 6 " 
14 
6 
7 


We  will  find  several  vector  representations  of  y in  this  example.  Notice  that  y 
never  changes,  but  the  representations  of  y do  change.  One  basis  for  C4  is 


B = {m,  u2,  u3,  u4}  = 


— 2' 
1 
2 

-3 


3 ‘ 

-6 

2 

-4 


as  can  be  seen  by  making  these  vectors  the  columns  of  a matrix,  checking  that  the 
matrix  is  nonsingular  and  applying  Theorem  CNMB.  To  find  ps  (y),  we  need  to 
find  scalars,  Oi,  a2,  a3,  04  such  that 


y — aiu4  + a2u2  + a3u3  + 04114 

By  Theorem  SLSLC  the  desired  scalars  are  a solution  to  the  linear  system  of 
equations  with  a coefficient  matrix  whose  columns  are  the  vectors  in  B and  with  a 
vector  of  constants  y.  With  a nonsingular  coefficient  matrix,  the  solution  is  unique, 
but  this  is  no  surprise  as  this  is  the  content  of  Theorem  VRRB.  This  unique  solution 
is 


oi  = 2 a2  = — 1 

Then  by  Definition  VR,  we  have 


Pb  (y) 


a3  = -3 


‘ 2 ‘ 
-1 
-3 
4 


o4  = 4 


Suppose  now  that  we  construct  a representation  of  y relative  to  another  basis  of 
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C4 


c = 


r— 1.51 

r 16 1 

r-261 

9 

-14 

14 

-4 

5 

5 

-6 

-2 

2 

-3 

14 

-13 

4 

6 


As  with  B , it  is  easy  to  check  that  C is  a basis.  Writing  y as  a linear  combination 
of  the  vectors  in  C leads  to  solving  a system  of  four  equations  in  the  four  unknown 
scalars  with  a nonsingular  coefficient  matrix.  The  unique  solution  can  be  expressed 
as 


' 6 ' 

"—15" 

" 16  " 

"-26" 

" 14  " 

14 

= (-28) 

9 

+ (—8) 

-14 

+ 11 

14 

+ 0 

-13 

6 

-4 

5 

-6 

4 

7 

-2 

2 

-3 

6 

so  that  Definition  VR  gives 


r— 28i 


pc  (y)  = 


-8 


11 

0 


We  often  perform  representations  relative  to  standard  bases,  but  for  vectors  in 
Cm  this  is  a little  silly.  Let  us  find  the  vector  representation  of  y relative  to  the 
standard  basis  (Theorem  SUVB), 

D = {ei,  e2,  e3,  04} 


Then,  without  any  computation,  we  can  check  that 


y = 


’ 6 ' 
14 
6 
7 


= 6ei  + 14e2  + 6e3  + 7e4 


so  by  Definition  VR, 


Pd  (y)  = 


’6' 

14 

6 

7 


which  is  not  very  exciting.  Notice  however  that  the  order  in  which  we  place  the 
vectors  in  the  basis  is  critical  to  the  representation.  Let  us  keep  the  standard  unit 
vectors  as  our  basis,  but  rearrange  the  order  we  place  them  in  the  basis.  So  a fourth 
basis  is 


E — {e3,  e4,  e2,  64} 
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Then, 


so  by  Definition  VR, 


y = 


= 6e3  + 7e4  + 14e2  + 6ei 


pe  (y) 


" 6 ' 
7 

14 

6 


y- 


So  for  every  possible  basis  of  C4  we  could  construct  a different  representation  of 

A 


Vector  representations  are  most  interesting  for  vector  spaces  that  are  not  Cm. 


Example  VRP2  Vector  representations  in  P2 

Consider  the  vector  u = 15  + lOx  — 6x2  £ P2  from  the  vector  space  of  polynomials 
with  degree  at  most  2 (Example  VSP).  A nice  basis  for  P2  is 

B = {l,  x,  x2} 

so  that 


u = 15  + lOx  — 6x2  = 15(1)  + 10(x)  + (— 6)(x2) 
so  by  Definition  VR 


Pb  (u) 


"15" 

10 

-6 


Another  nice  basis  for  P2  is 

C = { 1 , 1 + x,  1 + x + x2} 

so  that  now  it  takes  a bit  of  computation  to  determine  the  scalars  for  the  represen- 
tation. We  want  ai,  a2,  a3  so  that 

15  + lOx  — 6x2  = ai(l)  + a2(l  + x)  + a3(l  + x + x2) 

Performing  the  operations  in  P2  on  the  right-hand  side,  and  equating  coefficients, 
gives  the  three  equations  in  the  three  unknown  scalars, 


15  = a\  + a2  + a3 
10  = a2  + a3 
-6  = a3 


The  coefficient  matrix  of  this  sytem  is  nonsingular,  leading  to  a unique  solution 
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(no  surprise  there,  see  Theorem  VRRB), 

ai  = 5 g.2  = 16  03  = —6 


so  by  Definition  VR 


PC  (u) 


' 5 ' 
16 
-6 


While  we  often  form  vector  representations  relative  to  “nice”  bases,  nothing 
prevents  us  from  forming  representations  relative  to  “nasty”  bases.  For  example,  the 
set 


D = {—  2 — x + 3x 2,  1 — 2x2,  5 + 4.x  + x2} 

can  be  verified  as  a basis  of  P2  by  checking  linear  independence  with  Definition  LI 
and  then  arguing  that  3 vectors  from  P2,  a vector  space  of  dimension  3 (Theorem 
DP),  must  also  be  a spanning  set  (Theorem  G). 

Now  we  desire  scalars  ai,  02,  03  so  that 

15  + 10a;  — 6a;2  = cii(— 2 — x + 3a;2)  + a2(l  — 2a;2)  + a3(5  + 4a;  + x2) 

Performing  the  operations  in  P2  on  the  right-hand  side,  and  equating  coefficients, 
gives  the  three  equations  in  the  three  unknown  scalars, 


15  = — 2ai  + 02  + 5a3 
10  = —a±  + 4ci3 

—6  = 3ai  — 2o2  + a 3 


The  coefficient  matrix  of  this  sytem  is  nonsingular,  leading  to  a unique  solution 
(no  surprise  there,  see  Theorem  VRRB), 


CL\  — — 2 

so  by  Definition  VR 


a2  = 1 


Pd  (u) 


«3  — 2 


A 


Theorem  VRI  Vector  Representation  is  Injective 

The  function  ps  (Definition  VR)  is  an  injective  linear  transformation. 


Proof.  We  will  appeal  to  Theorem  KILT.  Suppose  U is  a vector  space  of  dimension 
n,  so  vector  representation  is  ps  '■  U — > Cn.  Let  B = {ui,  u2,  u3,  . . . , u„}  be  the 
basis  of  U used  in  the  definition  of  ps-  Suppose  u £ IC(pb).  We  write  u as  a linear 
combination  of  the  vectors  in  the  basis  B where  the  scalars  are  the  components  of 
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the  vector  representation,  ps  (u). 

u = [pB  (u)]1  ui  + \pB  (u)]2  u2  H b [pb  (u)]„  u„  Definition  VR 

= [0]1  Ui  + [0]2  u2  + • • • + [0]„  u„  Definition  KLT 

= 0u!  + 0u2  + • • • + 0u„  Definition  ZCV 

= 0 + 0 + • • ■ + 0 Theorem  ZSSM 

= 0 Property  Z 

Thus  an  arbitrary  vector,  u,  from  the  kernel  ,/C(/9b),  must  equal  the  zero  vector 
of  U.  So  IC(pb)  = {0}  and  by  Theorem  KILT,  pB  is  injective.  ■ 

Theorem  VRS  Vector  Representation  is  Surjective 

The  function  ps  (Definition  VR)  is  a surjective  linear  transformation. 

Proof.  We  will  appeal  to  Theorem  RSLT.  Suppose  U is  a vector  space  of  dimension 
n,  so  vector  representation  is  ps  '■  U — ► Cn.  Let  B = {m,  u2,  u3,  . . . , u„}  be  the 
basis  of  U used  in  the  definition  of  pB-  Suppose  v £ C".  Define  the  vector  u by 

u = [v]1  Ui  + [v]2  U2  + [v]3  u3  H b [v] n u„ 

Then  for  1 < i < n,  by  Definition  VR, 

\PB  (u)]4  = [pB  ([v],  ui  + [v]2  u2  + [v]3  u3  H b [v]n  u„)].  = [v]. 

so  the  entries  of  vectors  ps  (u)  and  v are  equal  and  Definition  CVE  yields  the  vector 
equality  pb  (u)  = v.  This  demonstrates  that  v € 1Z(pb ),  so  Cn  C lZ(pB).  Since 
U(pB)  C C"  by  Definition  RLT,  we  have  TZ(pb)  = Cra  and  Theorem  RSLT  says  ps 
is  surjective.  ■ 

We  will  have  many  occasions  later  to  employ  the  inverse  of  vector  representation,  so 
we  will  record  the  fact  that  vector  representation  is  an  invertible  linear  transformation. 

Theorem  VRILT  Vector  Representation  is  an  Invertible  Linear  Transformation 
The  function  ps  (Definition  VR)  is  an  invertible  linear  transformation. 

Proof.  The  function  ps  (Definition  VR)  is  a linear  transformation  (Theorem  VRLT) 
that  is  injective  (Theorem  VRI)  and  surjective  (Theorem  VRS)  with  domain  V 
and  codomain  C™.  By  Theorem  ILTIS  we  then  know  that  ps  is  an  invertible  linear 
transformation.  ■ 

Informally,  we  will  refer  to  the  application  of  ps  as  coordinating  a vector, 
while  the  application  of  p^1  will  be  referred  to  as  un-coordinatizing  a vector. 
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Subsection  CVS 

Characterization  of  Vector  Spaces 

Limiting  our  attention  to  vector  spaces  with  finite  dimension,  we  now  describe  every 
possible  vector  space.  All  of  them.  Really. 

Theorem  CFDVS  Characterization  of  Finite  Dimensional  Vector  Spaces 
Suppose  that  V is  a vector  space  with  dimension  n.  Then  V is  isomorphic  to  Cn. 

Proof.  Since  V has  dimension  n we  can  find  a basis  of  V of  size  n (Definition  D)  which 
we  will  call  B.  The  linear  transformation  pg  is  an  invertible  linear  transformation 
from  V to  Cn,  so  by  Definition  IVS,  we  have  that  V and  Cn  are  isomorphic.  ■ 

Theorem  CFDVS  is  the  first  of  several  surprises  in  this  chapter,  though  it  might 
be  a bit  demoralizing  too.  It  says  that  there  really  are  not  all  that  many  different 
(finite  dimensional)  vector  spaces,  and  none  are  really  any  more  complicated  than 
Cn.  Hninmn.  The  following  examples  should  make  this  point. 

Example  TIVS  Two  isomorphic  vector  spaces 

The  vector  space  of  polynomials  with  degree  8 or  less,  P8,  has  dimension  9 (Theorem 
DP).  By  Theorem  CFDVS,  P$  is  isomorphic  to  C9.  A 

Example  CVSR  Crazy  vector  space  revealed 

The  crazy  vector  space,  C of  Example  CVS,  has  dimension  2 by  Example  DC.  By 
Theorem  CFDVS,  C is  isomorphic  to  C2.  Hnnnmm.  Not  really  so  crazy  after  all? A 

Example  ASC  A subspace  characterized 

In  Example  DSP4  we  determined  that  a certain  subspace  W of  P4  has  dimension  4. 
By  Theorem  CFDVS,  W is  isomorphic  to  C4.  A 

Theorem  IFDVS  Isomorphism  of  Finite  Dimensional  Vector  Spaces 

Suppose  U and  V are  both  finite- dimensional  vector  spaces.  Then  U and  V are 

isomorphic  if  and  only  if  dim  (17)  = dim(V). 

Proof.  (=>)  This  is  just  the  statement  proved  in  Theorem  IVSED. 

(<=)  This  is  the  advertised  converse  of  Theorem  IVSED.  We  will  assume  U and 
V have  equal  dimension  and  discover  that  they  are  isomorphic  vector  spaces.  Let 
n be  the  common  dimension  of  U and  V . Then  by  Theorem  CFDVS  there  are 
isomorphisms  T:  U — )■  Cn  and  S : V — > C". 

T is  therefore  an  invertible  linear  transformation  by  Definition  IVS.  Similarly,  S is 
an  invertible  linear  transformation,  and  so  S is  an  invertible  linear  transformation 
(Theorem  IILT).  The  composition  of  invertible  linear  transformations  is  again 
invertible  (Theorem  CIVLT)  so  the  composition  of  S~x  with  T is  invertible.  Then 
(S'-1  of)  : U — > V is  an  invertible  linear  transformation  from  U to  V and  Definition 
IVS  says  U and  V are  isomorphic.  ■ 
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Example  MIVS  Multiple  isomorphic  vector  spaces 

C10,  Pg,  M25  and  M52  are  all  vector  spaces  and  each  has  dimension  10.  By  Theorem 
IFDVS  each  is  isomorphic  to  any  other. 

The  subspace  of  M44  that  contains  all  the  symmetric  matrices  (Definition  SYM) 
has  dimension  10,  so  this  subspace  is  also  isomorphic  to  each  of  the  four  vector 
spaces  above.  A 


Subsection  CP 
Coordinatization  Principle 

With  ps  available  as  an  invertible  linear  transformation,  we  can  translate  between 
vectors  in  a vector  space  U of  dimension  m and  Cm.  Furthermore,  as  a linear 
transformation,  pb  respects  the  addition  and  scalar  multiplication  in  U,  while  p 
respects  the  addition  and  scalar  multiplication  in  Cm.  Since  our  definitions  of  linear 
independence,  spans,  bases  and  dimension  are  all  built  up  from  linear  combinations, 
we  will  finally  be  able  to  translate  fundamental  properties  between  abstract  vector 
spaces  (U)  and  concrete  vector  spaces  (Cm). 

Theorem  CLI  Coordinatization  and  Linear  Independence 
Suppose  that  U is  a vector  space  with  a basis  B of  size  n.  Then 

S = {m,  u2,  u3,  . . . , ufc} 
is  a linearly  independent  subset  of  U if  and  only  if 

R = { pB  (w)  , pB  (u2)  , pB  (u3) , . . . , ps  (ufc)} 
is  a linearly  independent  subset  of  Cn. 

Proof.  The  linear  transformation  ps  is  an  isomorphism  between  U and  Cn  (Theorem 
VRILT).  As  an  invertible  linear  transformation,  ps  is  an  injective  linear  transforma- 
tion (Theorem  ILTIS),  and  pg1  is  also  an  injective  linear  transformation  (Theorem 
IILT,  Theorem  ILTIS). 

(=>)  Since  ps  is  an  injective  linear  transformation  and  S is  linearly  independent, 
Theorem  ILTLI  says  that  R is  linearly  independent. 

(-4=)  If  we  apply  pg1  to  each  element  of  R,  we  will  create  the  set  S.  Since  we  are 
assuming  R is  linearly  independent  and  pg1  is  injective,  Theorem  ILTLI  says  that  S 
is  linearly  independent.  ■ 


Theorem  CSS  Coordinatization  and  Spanning  Sets 
Suppose  that  U is  a vector  space  with  a basis  B of  size  n.  Then 

u£  ({Ui,  u2,  u3,  ...,  uk}) 


if  and  only  if 


PB  (u)  e ({ PB  (ui)  , PB  (u2)  , PB  (u3)  , • • ■ , Pb  (life)}) 
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Proof.  (=>)  Suppose  u G ({ui,  u2,  u3,  . . . , u*,}).  Then  we  know  there  are  scalars, 
ai,  a2,  a3,  . . . , ak,  such  that 

u = aiUi  + a2u2  + «3u3  + • • • + akuk 
Then,  by  Theorem  LTLC, 

Pb  (u)  = pB  (aiUi  + a2u2  + a3u3  H b afcufc) 

= 0-lPB  (ui)  + d2pB  (u2)  + «3 PB  (u3)  + ’ ’ ’ + dkPB  (ufc) 

which  says  that  pB  (u)  G ({ pB  (ui) , pB  (u2) , pB  (u3) , . . . , pB  (w)}). 

(<=)  Suppose  that  pB  (u)  G ({ pB  (ui) , pB  (u2) , pB  (u3) , pB  (ufc)}).  Then 
there  are  scalars  b\,  b2,  63,  . . . , bk  such  that 

pB  (u)  = bxpB  (u3)  + b2pB  (u2)  + b3pB  (u3)  H b bkpB  (ufc) 

Recall  that  ps  is  invertible  (Theorem  VRILT),  so 

u = Ijj  (u)  Definition  IDLT 


= (Pb1  °Pb)  (u) 


Definition  IVLT 
Definition  LTC 


= Pb1  (Pb  (u)) 

= Pb 1 (&i Pb  (ui)  + b2pB  (u2)  H h bkpB  (ufe)) 

= h p^1  ( pb  (ui))  + 62 Pb1  (pb  (u2))  H h bkp ^ (pB  (ufc))  Theorem  LTLC 

= hljj  (u3)  + b2Ijj  (u2)  -I h bkILr  (ufc)  Definition  IVLT 

= &1U1  + fe2u2  + 63u3  -| + bk  Ufc  Definition  IDLT 

which  says  that  u G ({u1;  u2,  u3,  . . . , ufc}).  ■ 

Here  is  a fairly  simple  example  that  illustrates  a very,  very  important  idea. 

Example  CP2  Coordinatizing  in  P2 
In  Example  VRP2  we  needed  to  know  that 

D = {—2  — x + 3a;2,  1 — 2a;2,  5 + 4x  + a;2} 

is  a basis  for  P2.  With  Theorem  CLI  and  Theorem  CSS  this  task  is  much  easier. 

First,  choose  a known  basis  for  P2,  a basis  that  forms  vector  representations 
easily.  We  will  choose 

B = {l,  x,  a;2} 

Now,  form  the  subset  of  C3  that  is  the  result  of  applying  ps  to  each  element  of 


D. 


F = {pb  (-2  - x + 3a;2)  , pB  { 1 - 2x2)  , pB  (5  + 4x  + a;2) } 


2" 
— 1 
3 


' 1 ' 
0 

—2 
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and  ask  if  F is  a linearly  independent  spanning  set  for  C3.  This  is  easily  seen  to  be 
the  case  by  forming  a matrix  A whose  columns  are  the  vectors  of  F,  row-reducing  A 
to  the  identity  matrix  1 3,  and  then  using  the  nonsingularity  of  A to  assert  that  F is 
a basis  for  C3  (Theorem  CNMB).  Now,  since  F is  a basis  for  C3,  Theorem  CLI  and 
Theorem  CSS  tell  us  that  D is  also  a basis  for  P2.  A 


Example  CP2  illustrates  the  broad  notion  that  computations  in  abstract  vector 
spaces  can  be  reduced  to  computations  in  Cm.  You  may  have  noticed  this  phenomenon 
as  you  worked  through  examples  in  Chapter  VS  or  Chapter  LT  employing  vector 
spaces  of  matrices  or  polynomials.  These  computations  seemed  to  invariably  result 
in  systems  of  equations  or  the  like  from  Chapter  SLE,  Chapter  V and  Chapter  M. 
It  is  vector  representation,  pg,  that  allows  us  to  make  this  connection  formal  and 
precise. 

Knowing  that  vector  representation  allows  us  to  translate  questions  about  linear 
combinations,  linear  independence  and  spans  from  general  vector  spaces  to  Cm  allows 
us  to  prove  a great  many  theorems  about  how  to  translate  other  properties.  Rather 
than  prove  these  theorems,  each  of  the  same  style  as  the  other,  we  will  offer  some 
general  guidance  about  how  to  best  employ  Theorem  VRLT,  Theorem  CLI  and 
Theorem  CSS.  This  comes  in  the  form  of  a “principle”:  a basic  truth,  but  most 
definitely  not  a theorem  (hence,  no  proof). 

The  Coordinatization  Principle 

Suppose  that  U is  a vector  space  with  a basis  B of  size  n.  Then  any  question 
about  U,  or  its  elements,  which  ultimately  depends  on  the  vector  addition  or  scalar 
multiplication  in  U,  or  depends  on  linear  independence  or  spanning,  may  be  translated 
into  the  same  question  in  Cn  by  application  of  the  linear  transformation  pg  to  the 
relevant  vectors.  Once  the  question  is  answered  in  C",  the  answer  may  be  translated 
back  to  U through  application  of  the  inverse  linear  transformation  p g1  (if  necessary). 


Example  CM32  Coordinatization  in  M32 

This  is  a simple  example  of  the  Coordinatization  Principle,  depending  only  on  the  fact 
that  coordinatizing  is  an  invertible  linear  transformation  (Theorem  VRILT).  Suppose 
we  have  a linear  combination  to  perform  in  M32,  the  vector  space  of  3 x 2 matrices, 
but  we  are  adverse  to  doing  the  operations  of  M32  (Definition  MA,  Definition  MSM). 
More  specifically,  suppose  we  are  faced  with  the  computation 


r 3 

7 1 

r-i 

3] 

6 

— 

2 

4 

+ 2 

4 

8 

L 0 

-3J 

-2 

5j 

choose  a nice  basis  for  M32  ( 

or  a 

nasty  basis  i 

r 

r1  °i 

ro 

0] 

ro 

0] 

'0  11 

B=\ 

0 0 

1 

0 

0 

0 

0 0 

l 

0 0 

0 

0 

1 

0 

0 0 

'0  O' 
0 1 
0 0 


'0  O' 
0 0 
0 1 


and  apply  pg  to  each  vector  in  the  linear  combination.  This  gives  us  a new  compu- 
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tation,  now  in  the  vector  space  C6,  which  we  can  compute  with  operations  in  C6 
(Definition  CVA,  Definition  CVSM), 


' 3 ' 

'-1' 

'16' 

-2 

4 

-4 

0 

+ 2 

-2 

-4 

7 

3 

— 

48 

4 

8 

40 

—3 

5 

-8 

We  are  after  the  result  of  a computation  in  M32,  so  we  now  can  apply  pB  to 
obtain  a 3 x 2 matrix, 


16 

T O' 
0 0 

+ (—4) 

'0  O' 
1 0 

+ (-4) 

'0  O' 
0 0 

+ 48 

'0  r 
0 0 

+ 40 

'0  O' 

0 1 

+ (-8) 

'0  O' 
0 0 

0 0 

0 0 

1 0 

0 0 

0 0 

0 1 

'16 

-4 

-4 


48' 

40 

-8 


which  is  exactly  the  matrix  we  would  have  computed  had  we  just  performed  the 
matrix  operations  in  the  first  place.  So  this  was  not  meant  to  be  an  easier  way  to 
compute  a linear  combination  of  two  matrices,  just  a different  way.  A 


Reading  Questions 


1.  The  vector  space  of  3 x 5 matrices,  M35  is  isomorphic  to  what  fundamental  vector  space? 

2.  A basis  for  C3  is 


B = 


1 

2 

-1 


3 

-1 

2 


Compute  pB 


-1 


3.  What  is  the  first  “surprise,”  and  why  is  it  surprising? 


Exercises 

CIO'  In  the  vector  space  C3,  compute  the  vector  representation  pB  (v)  for  the  basis  B 
and  vector  v below. 


f 

2 

1 

3 

) 

'll' 

B=\ 

—2 

3 

5 

\ 

5 

\ 

2 

1 

2 

1 

8 
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C20'  Rework  Example  CM32  replacing  the  basis  B by  the  basis 


f 

"—14 

-9' 

"-7 

-4 

"-3  -l" 

"-7 

—4 

' 4 

2 ' 

" 0 

0 " 

c=\ 

10 

10 

5 

5 

5 

0 -2 

3 

2 

5 

-3 

-3 

-1 

-2 

l 

-6 

-2 

-3 

-1 

1 1 

-1 

0 

2 

1 

1 

1 

M10  Prove  that  the  set  S below  is  a basis  for  the  vector  space  of  2 x 2 matrices,  M22. 
Do  this  by  choosing  a natural  basis  for  M22  and  coordinatizing  the  elements  of  S with 
respect  to  this  basis.  Examine  the  resulting  set  of  column  vectors  from  C4  and  apply  the 
Coordinatization  Principle. 


"33  99' 

— 16  -47' 

'10 

27' 

78  -9 

-36  2 

■ 

17 

3 

-2 

-6 


-7 

4 


M201"  The  set  B = {vi,  V2,  V3,  V4}  is  a basis  of  the  vector  space  P3,  polynomials  with 
degree  3 or  less.  Therefore  ps  is  a linear  transformation,  according  to  Theorem  VRLT. 
Find  a “formula”  for  ps-  In  other  words,  find  an  expression  for  ps  ( a + bx  + cx2  + dx 3). 

vi  = 1 — 5x  — 22x2  + 3x3  v2  = —2  + llx  + 49x2  — 8x3 

V3  = — 1 + 7x  + 33x2  — 8x3  V4  = — 1 + 4x  + I612  + x3 


Section  MR 

Matrix  Representations 

We  have  seen  that  linear  transformations  whose  domain  and  codomain  are  vector 
spaces  of  columns  vectors  have  a close  relationship  with  matrices  (Theorem  MBLT, 
Theorem  MLTCV) . In  this  section,  we  will  extend  the  relationship  between  matrices 
and  linear  transformations  to  the  setting  of  linear  transformations  between  abstract 
vector  spaces. 


Subsection  MR 
Matrix  Representations 

This  is  a fundamental  definition. 

Definition  MR  Matrix  Representation 

Suppose  that  T : U V is  a linear  transformation,  B = {ui,  u2,  u3,  . . . , u„} 
is  a basis  for  U of  size  n,  and  C is  a basis  for  V of  size  to.  Then  the  matrix 
representation  of  T relative  to  B and  C is  the  m X n matrix, 

Mlc  = [ PC  (T  (Ul))|  pc  ( T (u2))|  pc  ( T (u3))|  ...\pc  (T  (u„))  ] 

□ 


Example  OLTTR  One  linear  transformation,  three  representations 
Consider  the  linear  transformation,  S : P3  — > M22,  given  by 


S (a  + bx  + cx2  + dx 3) 


3a  + 7b  — 2c  — 5 d 8 a + 146  — 2c  — 11  d 

—4a  — 86  + 2c  + 6c?  12a  + 226  — 4c  — 17  d 


First,  we  build  a representation  relative  to  the  bases, 

B = {l  + 2x  + x2  — x3,  1 + 3x  + x2  + x3,  — 1 — 2x  + 2x 3,  2 + 3a;  + 2x2  — 5a;3} 


C = 


-1  -1 

0 -2 


-1  -4 

-2  -4 


We  evaluate  S with  each  element  of  the  basis  for  the  domain,  B,  and  coordinatize 
the  result  relative  to  the  vectors  in  the  basis  for  the  codomain,  C.  Notice  here  how 
we  take  elements  of  vector  spaces  and  decompose  them  into  linear  combinations  of 
basis  elements  as  the  key  step  in  constructing  coordinatizations  of  vectors.  There 
is  a system  of  equations  involved  almost  every  time,  but  we  will  omit  these  details 
since  this  should  be  a routine  exercise  at  this  stage. 


pc  ( S (l  + 2x  + x2  - x3))  = pc 


' 20  45]  \ 

-24  69j ) 
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1 1 
1 2 


37 


2 3 
2 5 


= Pc  ^(-90) 
pc  ( S (l  + 3x  + x2  + a;3))  = pc  ^ 
= PC  r (-72) 


1 1 
1 2 


29 


2 3 
2 5 


(-40) 


17  37 

-20  57 


(-34) 


-1  -1 

0 -2 


-1  -4 

-2  -4 


-1  -1 

0 -2 


+ 3 


-1  -4 

-2  -4 


pc  ( S (-1  - 2a:  + 2a;3))  = pc  ( ^ -90  ) 


= PC  H4 


1 1 
1 2 


(-46) 


2 3 
2 5 


54 


-1  -1 

0 -2 


(-5) 


-1  -4 

-2  -4 


pc  (S' (2  + 3a;  + 2a;2 -5a:3))  = pc  ^ ^ ^ 


= Pc  -220 


1 1 
1 2 


+ 91 


2 3 
2 5 


+ (-96) 


-1 

0 


+ 10 


-1  -4 

-2  -4 


-90' 

37 

-40 

4 


-72' 

29 

-34 

3 


114' 

-46 

54 

-5 


-220' 

91 

-96 

10 


Thus,  employing  Definition  MR 


Mb,c  — 


’—90 

-72 

114 

-220’ 

37 

29 

-46 

91 

-40 

-34 

54 

-96 

4 

3 

-5 

10 

Often  we  use  “nice”  bases  to  build  matrix  representations  and  the  work  involved 
is  much  easier.  Suppose  we  take  bases 


D = { 1,  x,  x2,  x3} 


E = 


1 0 
0 0 


0 1 
0 0 


0 0 
1 0 


0 0 
0 1 


The  evaluation  of  S at  the  elements  of  D is  easy  and  coordinatization  relative  to 
E can  be  done  on  sight, 


Pe  (S  (1))  = pE 


3 8 

-4  12 
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= Pe  ( 3 
pE  ( S (x))  = pE 

= PE  ( 7 

pE  (S  ( X1  2))  = pE 

= PE  ( (-2) 


1 0 
0 0 


+ 8 


0 1 
0 0 


+ (-4) 


0 0 
1 0 


+ 12 


0 0 
0 1 


7 14 

-8  22 

1 O' 

0 0 


+ 14 


0 1 
0 0 


(-8) 


0 0 
1 0 


+ 22 


0 0 
0 1 


-4 

12 


7 ' 
14 
-8 
22 


-2  -2 
2 -4 


1 0 
0 0 


+ (-2) 


pE  (S  ( x 3))  = pE 

= Pe  ( (-5) 


-5  -11 
6 -17 


1 0 
0 0 


+ (-H) 


0 1 
0 0 


0 1 
0 0 


+ 2 


0 0 
1 0 


+ (-4) 


0 0 
0 1 


—2" 

-2 

2 

-4 


+ 6 


0 0 
1 0 


+ (-17) 


0 0 
0 1 


—5  ‘ 

-11 
6 

-17 

So  the  matrix  representation  of  S relative  to  D and  E is 


Md,e  ~ 


‘ 3 

7 

-2 

—5 " 

8 

14 

-2 

-11 

-4 

-8 

2 

6 

12 

22 

-4 

-17 

One  more  time,  but  now  let  us  use  bases 
F = {l  + x — x2  + 2a;3,  — 1 + 2x  + 2a;3,  2 + x — 2a:2  + 3a:3,  1 + x + 2a;3} 


G = 


1 1 

-1  2 


-1  2 

0 2 


2 1 
-2  3 


1 1 
0 2 


and  evaluate  S with  the  elements  of  F,  then  coordinatize  the  results  relative  to  G, 

m 


pG  {S  (l  + x - a:2  + 2a;3))  = pG 


2 2 
-2  4 


= PG  2 


1 1 

-1  2 
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PG  (S  (-1  +2x  + 2a;3))  = pG 


PG  (S  (2  + x - 2a;2  + 3a;3))  = pG 


PG  (S  (1  + x + 2a;3))  = pG 


PG  (-1) 


-1 

0 


PG 


'O' 

0 

“ 1 

0. 

'O' 

0 

0 

0 


' 0 ' 
-1 

0 

0 


So  we  arrive  at  an  especially  economical  matrix  representation, 

'2  0 0 O' 

0-100 
0 0 10 

.0  0 0 0. 

A 

We  may  choose  to  use  whatever  terms  we  want  when  we  make  a definition.  Some 
are  arbitrary,  while  others  make  sense,  but  only  in  light  of  subsequent  theorems. 
Matrix  representation  is  in  the  latter  category.  We  begin  with  a linear  transformation 
and  produce  a matrix.  So  what?  Here  is  the  theorem  that  justifies  the  term  “matrix 
representation.” 

Theorem  FTMR  Fundamental  Theorem  of  Matrix  Representation 
Suppose  that  T : U — ► V is  a linear  transformation,  B is  a basis  for  U , C is  a basis 
for  V and  is  the  matrix  representation  of  T relative  to  B and  C.  Then,  for 

any  u £ U, 

Pc  {T  (u))  = Ml  c ( pB  (u)) 

or  equivalently 

T (u)  = Pc 1 (Mlc  ( Pb  (u))) 

Proof.  Let  B = {ux,  u2,  u3,  . . . , u„}  be  the  basis  of  U.  Since  u € U,  there  are 
scalars  ai,  a2,  a3,  . . . , an  such  that 

u = aiUi  + a2u2  + a3u3  + • • • + anun 

Then, 

mb,cPb  (u) 

= [PC  ( T (ui))|  pc  ( T (u2))|  pc  (T  (u3))| . . . | pc  ( T (un))]pB  (u)  Definition  MR 


§MR 


Beezer:  A First  Course  in  Linear  Algebra 


518 


= [pc  (T  (Ul))|  pc  ( T (u2))|  pc  {T  (u3))| . . . | pc  (T  (u„))  ] 


= aipc  [T  (ui))  + a2pc  ( T (u2))  H b anpc  ( T (u„)) 

= pc  (aiT  (ui)  + a2T  (u2)  + a3T  (u3)  H b anT  (u„)) 

= pc  ( T (aiui  + a2u2  + a3u3  H b an u„)) 

= Pc  (T( u)) 


a i 
a2 
a 3 


S^n_ 


The  alternative  conclusion  is  obtained  as 


Definition  VR 


Definition  MVP 
Theorem  LTLC 
Theorem  LTLC 


T (u)  = Iv  (T  (u)) 

= {Pc1  ° Pc)  (T(u)) 

= Pc  (pc(T( u))) 

= Pc1  ( mb,c  ( PB  (u))) 


Definition  IDLT 
Definition  IVLT 
Definition  LTC 


This  theorem  says  that  we  can  apply  T to  u and  coordinatize  the  result  relative 
to  C in  V,  or  we  can  first  coordinatize  u relative  to  B in  17,  then  multiply  by  the 
matrix  representation.  Either  way,  the  result  is  the  same.  So  the  effect  of  a linear 
transformation  can  always  be  accomplished  by  a matrix-vector  product  (Definition 
MVP).  That  is  important  enough  to  say  again.  The  effect  of  a linear  transformation 
is  a matrix-vector  product. 


u 


T 

> 


T(u) 


Pb 


Pc 


A lie 

pB  (u)  : * M^)C  pB  (u)  = pc  (T  (u)) 


Diagram  FTMR:  Fundamental  Theorem  of  Matrix  Representations 

The  alternative  conclusion  of  this  result  might  be  even  more  striking.  It  says  that 
to  effect  a linear  transformation  (T)  of  a vector  (u),  coordinatize  the  input  (with  ps), 
do  a matrix-vector  product  (with  Mg  C)1  and  un-coordinatize  the  result  (with  p^1). 
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So,  absent  some  bookkeeping  about  vector  representations,  a linear  transformation 
is  a matrix.  To  adjust  the  diagram,  we  “reverse”  the  arrow  on  the  right,  which 
means  inverting  the  vector  representation  pc  on  V.  Now  we  can  go  directly  across 
the  top  of  the  diagram,  computing  the  linear  transformation  between  the  abstract 
vector  spaces.  Or,  we  can  around  the  other  three  sides,  using  vector  representation, 
a matrix- vector  product,  followed  by  un-coordinatization. 

u — > T (u)  = pc1  (Ml  cpB  (u)) 

A 


Pb 


Pc 1 


PB  (u) 


Mlc 


M b,c  Pb  (u) 


Diagram  FTMRA:  Fundamental  Theorem  of  Matrix  Representations  (Alternate) 


Here  is  an  example  to  illustrate  how  the  “action”  of  a linear  transformation  can 
be  effected  by  matrix  multiplication. 


Example  ALTMM  A linear  transformation  as  matrix  multiplication 
In  Example  OLTTR  we  found  three  representations  of  the  linear  transformation  S. 
In  this  example,  we  will  compute  a single  output  of  S in  four  different  ways.  First 
“normally,”  then  three  times  over  using  Theorem  FTMR. 

Choose  p(x)  = 3— x + 2x2  — 5x3,  for  no  particular  reason.  Then  the  straightforward 
application  of  S to  p{x)  yields 

S (p(x))  = S (3  — x + 2x2  — 5a;3) 

' 3(3)  + 7(— 1)  - 2(2)  - 5(— 5)  8(3)  + 14(-1)  - 2(2)  - 11(— 5) ' 

“ -4(3)  - 8(-l)  +2(2)  +6(-5)  12(3)  + 22(-l)  - 4(2)  - 17(-5) 

' 23  61 

-30  91 


Now  use  the  representation  of  S relative  to  the  bases  B and  C and  Theorem 
FTMR.  Note  that  we  will  employ  the  following  linear  combination  in  moving  from 
the  second  line  to  the  third, 

3 — x + 2x2  — 5x3  = 48(1  + 2x  + x2  — x3)  + (— 20)(1  + 3a;  + x2  + a;3)+ 
(-1)(-1  - 2a;  + 2a;3)  + (-13)(2  + 3a;  + 2a;2  - 5a;3) 


S (p(x))  = (M^cPb  (p(x))) 
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= PC 1 ( mb,cPb  (3  - x + 2x2  - 5a;3)) 


' 23  61' 

“ -30  91 


Again,  but  now  with  “nice”  bases  like  D and  E,  and  the  computations  are  more 
transparent. 

S (p( x))  = p^1  (M^ePd  (p{x))) 

= Pe1  (M^  ePd  (3  - x + 2x2  - 5a;3)) 

= Pe  {M d,ePd  (3(1)  + (— l)(a;)  + 2(a;2)  + (— 5)(a:3))) 


' 23  61' 

: -30  91 
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OK,  last  time,  now  with  the  bases  F and  G.  The  coordinatizations  will  take 
some  work  this  time,  but  the  matrix-vector  product  (Definition  MVP)  (which  is  the 
actual  action  of  the  linear  transformation)  will  be  especially  easy,  given  the  diagonal 
nature  of  the  matrix  representation,  Mp  G.  Here  we  go, 


S C p(x )) 


Pg  ( mf,gPf  ( p{x ))) 

Pq 1 ( Mp  Gpp  (3  — x + 2x2  — 5a;3)) 

Pg1  ( mf,gPf  (32(1  + x - x2  + 2a;3)  - 7(-l  + 2x  + 2a;3) 

-17(2  + x-  2x2  + 3x 3)  - 2(1  + x + 2a;3))) 


Pg 1 


Pg 1 


M 


F,G 


‘ 32  ‘ 

\ 

-7 

-17 

-2 

7 

2 0 0 O' 

0-100 
0 0 10 

0 0 0 0 


‘ 32  ‘ 

\ 

-7 

-17 

-2 

/ 

Pg 1 


/ 

■ 64  ■ 

\ 

7 

-17 

V 

0 

7 

■ 1 1 

-i  2 

' 2 1 

'l  1 

-1  2 

+ 7 

0 2 

+ (-17) 

-2  3 

+ 0 

0 2 

23  61 

-30  91 


This  example  is  not  meant  to  necessarily  illustrate  that  any  one  of  these  four 
computations  is  simpler  than  the  others.  Instead,  it  is  meant  to  illustrate  the  many 
different  ways  we  can  arrive  at  the  same  result,  with  the  last  three  all  employing  a 
matrix  representation  to  effect  the  linear  transformation.  A 


We  will  use  Theorem  FTMR  frequently  in  the  next  few  sections.  A typical 
application  will  feel  like  the  linear  transformation  T “commutes”  with  a vector 
representation,  pc,  and  as  it  does  the  transformation  morphs  into  a matrix,  Mp  G, 
while  the  vector  representation  changes  to  a new  basis,  pb-  Or  vice-versa. 


Subsection  NRFO 

New  Representations  from  Old 

In  Subsection  LT.NLTFO  we  built  new  linear  transformations  from  other  linear  trans- 
formations. Sums,  scalar  multiples  and  compositions.  These  new  linear  transforma- 
tions will  have  matrix  representations  as  well.  How  do  the  new  matrix  representations 
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relate  to  the  old  matrix  representations?  Here  are  the  three  theorems. 

Theorem  MRSLT  Matrix  Representation  of  a Sum  of  Linear  Transformations 
Suppose  that  T:  U — ► V and  S : U -A  V are  linear  transformations,  B is  a basis  of 
U and  C is  a basis  ofV.  Then 

M^+s  = Mlc  + M^c 


Proof.  Let  x be  any  vector  in  C".  Define  u £ U by  u = pB  (x),  so  x = pg  (u). 
Then, 

Substitution 


= Pc((T  + S)  (u)) 

= pc  (T(u)  + 5(u)) 

= pc  (T{u))+pc  (S(u)) 

= M BC  ( Pb  (u))  + MBC  ( pb  (u)) 
= {M b,c  + Mb,c)  Pb  (u) 

= {Mb,c  + M b c ) x 


Theorem  FTMR 
Definition  LTA 
Definition  LT 
Theorem  FTMR 
Theorem  MMDAA 
Substitution 


Since  the  matrices  MB~^f  and  c + M|  c have  equal  matrix-vector  products 
for  every  vector  in  Cn,  by  Theorem  EMMVP  they  are  equal  matrices.  (Now  would 
be  a good  time  to  double-back  and  study  the  proof  of  Theorem  EMMVP.  You  did 
promise  to  come  back  to  this  theorem  sometime,  didn’t  you?)  ■ 


Theorem  MRMLT  Matrix  Representation  of  a Multiple  of  a Linear  Transforma- 
tion 

Suppose  that  T : U — » V is  a linear  transformation,  a £ C,  B is  a basis  of  U and  C 
is  a basis  of  V.  Then 

Mbtc  = aMBC 


Proof.  Let  x be  any  vector  in  C".  Define 
Then, 

M%Tcx  = MBTcpB  (u) 

= PC  (( aT ) (u)) 

= pc  (aT  (u)) 

= apc  (T  (u)) 

= « (Mb,cPb  (u)) 

= (aMB,c)  Pb  (u) 

= {aMB,C)  x 


u G U by  u = Pb1  (x),  so  x = pB  (u). 

Substitution 
Theorem  FTMR 
Definition  LTSM 
Definition  LT 
Theorem  FTMR 
Theorem  MMSMM 
Substitution 
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Since  the  matrices  MBTC  and  a A 1B  G have  equal  matrix- vector  products  for  every 
vector  in  Cn,  by  Theorem  EMMVP  they  are  equal  matrices.  ■ 


The  vector  space  of  all  linear  transformations  from  U to  V is  now  isomorphic  to 
the  vector  space  of  all  m x n matrices. 


Theorem  MRCLT  Matrix  Representation  of  a Composition  of  Linear  Transfor- 
mations 

Suppose  that  T : U -A  V and  S : V — > W are  linear  transformations,  B is  a basis  of 
U,  C is  a basis  ofV,  and  D is  a basis  ofW.  Then 

MSoT  — Ms  Mt 
lviB,D  — 1V1C,D1UB,C 


Proof.  Let  x be  any  vector  in  C".  Define  u £ U by  u = pB  (x),  so  x = pg  (u). 
Then, 


M b°dPb  (u) 


= Pd((SoT)  (u)) 

= pD  (S  (T  (u))) 

= Mg  DPc  (T  (u)) 

= Mc,d  (M b,cPb  (u)) 
= (Mc,dMb,c)  Pb  (u) 
= (^C,D^B,c)  X 


Substitution 
Theorem  FTMR 
Definition  LTC 
Theorem  FTMR 
Theorem  FTMR 
Theorem  MMA 
Substitution 


Since  the  matrices  M and  MG  DMB  c have  equal  matrix- vector  products  for 
every  vector  in  C",  by  Theorem  EMMVP  they  are  equal  matrices.  ■ 


This  is  the  second  great  surprise  of  introductory  linear  algebra.  Matrices  are 
linear  transformations  (functions,  really),  and  matrix  multiplication  is  function 
composition!  We  can  form  the  composition  of  two  linear  transformations,  then  form 
the  matrix  representation  of  the  result.  Or  we  can  form  the  matrix  representation  of 
each  linear  transformation  separately,  then  multiply  the  two  representations  together 
via  Definition  MM.  In  either  case,  we  arrive  at  the  same  result. 


Example  MPMR  Matrix  product  of  matrix  representations 
Consider  the  two  linear  transformations, 


T:C2-aP2  T 


(—a  + 36)  + (2a  + 46)a;  + (a  — 2b)x2 


S (a  + bx  + ex2) 


2a  + b + 2c  a + 46  — c 
—a  + 3c  3a  + 6 + 2c 


S : P2  — y AI22 
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and  bases  for  C2,  Pi  and  M2 2 (respectively), 


B = 


C = {l  — 2x  + x 2,  — 1 + 3x,  2x  + 3x2} 
D = 


1 -2 

1 -1 


1 -1 

1 2 


-1  2 

0 0 


2 -3 
2 2 


Begin  by  computing  the  new  linear  transformation  that  is  the  composition  of  T 
and  S (Definition  LTC,  Theorem  CLTLT),  (S  o T)  : C2  — > M2 2, 


(SoT) 


= S T 


= S ((—a  + 3b)  + (2a  + 4 b)x  + (a  — 2b)x2) 

2(—a  + 36)  + (2a  + 46)  + 2(a  - 26)  (-a  + 36)  + 4(2a  + 46)  - (a  - 26) 

— (—a  + 36)  + 3(a  — 26)  3(-a  + 36)  + (2a  + 46)  + 2(a  - 26) 

2a  + 66  6a + 216 
4a  — 96  a + 96 


Now  compute  the  matrix  representations  (Definition  MR)  for  each  of  these  three 
linear  transformations  (T,  S,  S oT),  relative  to  the  appropriate  bases.  First  for  T, 


PC 


' ^ = Pc  (lOx  + x2) 

= pc  (28(1  - 2x  + x2)  + 28(— 1 + 3x)  + (-9)(2x  + 3x2)) 


'28‘ 

28 

-9 


Pc 


pc  (1  + 8x) 

pc  (33(1  - 2x  + x2)  + 32(— 1 + 3x)  + (-ll)(2x  + 3x2)) 

' 33  ' 

32 

-11 


So  we  have  the  matrix  representation  of  T, 


M’E.c 


-28  33  ' 

28  32 

-9  -11 
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Now,  a representation  of  5, 

Pd  (S  (1  - 2x  + x2))  = Pd  (^2  3 ) 

= ^((-H)  1 Ii  +(-21)  } ~2  +0  ~0 


pD(S  (~l  + 3x))  = pD  ^ | q1  ^ 


So  we  have  the  matrix  representation  of  S, 

- 11  26  34  ' 

—21  51  67 

0 0 1 

17  -38  -46. 

Finally,  a representation  of  S o T, 


Pd 


(SoT) 


Pd 


12 

3 


(—174)  [2 
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114  ' 
237 
-9 
-174 


Pd  (So  T) 


= Pd  I 95 


95  ' 
202 
-11 
-149 


= Pd 


202 


10  33 

-1  11 

1 -1 
1 2 


+ (-H) 


-1 

0 


(-149) 


2 -3 
2 2 


So  we  have  the  matrix  representation  of  S'  o T, 


MSoT  — 

mB,D  ~ 


Now,  we  are  all  set  to  verify  the  conclusion  of  Theorem  MRCLT, 


^C,D^B,C 


‘ 114 

95 

237 

202 

-9 

-11 

-174 

-149 

■-11 

26 

34  ‘ 

-21 

51 

67 

0 

0 

1 

. 17 

-38 

-46 

‘ 114 

95  ‘ 

237 

202 

-9 

-11 

-174 

-149 

lDtSoT 

IUB,D 

‘28 

33  ' 

28 

32 

-9 

-11 

We  have  intentionally  used  nonstandard  bases.  If  you  were  to  choose  “nice”  bases 
for  the  three  vector  spaces,  then  the  result  of  the  theorem  might  be  rather  transparent. 
But  this  would  still  be  a worthwhile  exercise  — give  it  a go.  A 

A diagram,  similar  to  ones  we  have  seen  earlier,  might  make  the  importance  of 
this  theorem  clearer, 
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S.T 


Definition  MR 


Definition  LTC 


Definition  MM 


SoT 


Definition  MR 


Diagram  MRCLT:  Matrix  Representation  and  Composition  of  Linear 

Transformations 

One  of  our  goals  in  the  first  part  of  this  book  is  to  make  the  definition  of 
matrix  multiplication  (Definition  MVP,  Definition  MM)  seem  as  natural  as  possible. 
However,  many  of  us  are  brought  up  with  an  entry-by-entry  description  of  matrix 
multiplication  (Theorem  EMP)  as  the  definition  of  matrix  multiplication,  and 
then  theorems  about  columns  of  matrices  and  linear  combinations  follow  from  that 
definition.  With  this  unmotivated  definition,  the  realization  that  matrix  multiplication 
is  function  composition  is  quite  remarkable.  It  is  an  interesting  exercise  to  begin  with 
the  question,  “What  is  the  matrix  representation  of  the  composition  of  two  linear 
transformations?”  and  then,  without  using  any  theorems  about  matrix  multiplication, 
finally  arrive  at  the  entry-by-entry  description  of  matrix  multiplication.  Try  it  yourself 
(Exercise  MR.T80). 


It  will  not  be  a surprise  to  discover  that  the  kernel  and  range  of  a linear  transformation 
are  closely  related  to  the  null  space  and  column  space  of  the  transformation’s  matrix 
representation.  Perhaps  this  idea  has  been  bouncing  around  in  your  head  already, 
even  before  seeing  the  definition  of  a matrix  representation.  However,  with  a formal 
definition  of  a matrix  representation  (Definition  MR),  and  a fundamental  theorem 
to  go  with  it  (Theorem  FTMR)  we  can  be  formal  about  the  relationship,  using  the 
idea  of  isomorphic  vector  spaces  (Definition  IVS).  Here  are  the  twin  theorems. 

Theorem  KNSI  Kernel  and  Null  Space  Isomorphism 

Suppose  that  T:  U — )•  V is  a linear  transformation,  B is  a basis  for  U of  size  n,  and 
C is  a basis  for  V.  Then  the  kernel  ofT  is  isomorphic  to  the  null  space  of  Mg  c, 


Proof.  To  establish  that  two  vector  spaces  are  isomorphic,  we  must  find  an  isomor- 
phism between  them,  an  invertible  linear  transformation  (Definition  IVS).  The  kernel 
of  the  linear  transformation  T , /C(T),  is  a subspace  of  U,  while  the  null  space  of  the 


Subsection  PMR 

Properties  of  Matrix  Representations 


JC(T)^AT(Mlc) 
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matrix  representation,  A l~(Mg  c ) is  a subspace  of  Cn.  The  function  pB  is  defined  as 
a function  from  U to  C™,  but  we  can  just  as  well  employ  the  definition  of  pb  as  a 
function  from  K,(T)  to  Af(MgC). 

We  must  first  insure  that  if  we  choose  an  input  for  pB  from  /C(T)  that  then  the 
output  will  be  an  element  of  AT  (Mg  c).  So  suppose  that  u e /C(T).  Then 

MlcPB  (u)  = pc  (T  (u))  Theorem  FTMR 

= pc  (0)  Definition  KLT 

= 0 Theorem  LTTZZ 


This  says  that  pB  (u)  £ AT (Mg  c),  as  desired. 

The  restriction  in  the  size  of  the  domain  and  codomain  pB  will  not  affect  the 
fact  that  pb  is  a linear  transformation  (Theorem  VRLT),  nor  will  it  affect  the  fact 
that  pb  is  injective  (Theorem  VRI).  Something  must  be  done  though  to  verify  that 
Pb  is  surjective.  To  this  end,  appeal  to  the  definition  of  surjective  (Definition  SLT), 
and  suppose  that  we  have  an  element  of  the  codomain,  x € Af(Mg  c ) C C"  and  we 
wish  to  find  an  element  of  the  domain  with  x as  its  image.  We  now  show  that  the 
desired  element  of  the  domain  is  u = pg1  (x).  First,  verify  that  u € /C(T), 

T(u)=T(pg1  (x)) 

= p^1  (Mg  C (pB  ( Pb 1 (x))))  Theorem  FTMR 

= Pq1  ( Mg  C (/{>  (x)))  Definition  IVLT 

= Pq1  (Mg  cx)  Definition  IDLT 

= p^1  (0c")  Definition  KLT 

= 0y  Theorem  LTTZZ 


Second,  verify  that  the  proposed  isomorphism,  ps,  takes  u to  x, 


pB  (u)  = pB  (pbx  (x)) 
= JC"  (x) 


Substitution 
Definition  IVLT 
Definition  IDLT 


With  pb  demonstrated  to  be  an  injective  and  surjective  linear  transformation 
from  /C(T)  to  Af(Mg  c ) , Theorem  ILTIS  tells  us  ps  is  invertible,  and  so  by  Definition 
IVS,  we  say  K,(T)  and  Af(Mg  C ) are  isomorphic.  ■ 

Example  KVMR  Kernel  via  matrix  representation 

Consider  the  kernel  of  the  linear  transformation,  T : M22  — > P2,  given  by 

^ = (2a  — b + c — 5 d)  + (a  + 46  + 5c  + 2 d)x  + (3  a — 2 b + c — 8 d)x2 


fab 
y c d 


T 
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We  will  begin  with  a matrix  representation  of  T relative  to  the  bases  for  M22 
and  P-2  (respectively), 


f 

' 1 2 ' 

' 1 3 ' 

'l  2 ' 

i 

-1  -1 

-1  -4 

1 

0 -2 

C = {l  + x + x2,  2 + 3x,  —1  — 2a;2} 


2 

-2 


5 

-4 


Then, 


PC 


1 

-1 


Pc 


PC 


PC 


Pc  (4  + 2x  + 6a;2) 

pc  (2(1  + x + x2)  + 0(2  + 3®)  + (— 2)(— 1 - 2a;2)) 

' 2 ' 

0 

_-2 

pc  (18  + 28a;2) 

pc  ((— 24)(1  + x + a;2)  + 8(2  + 3x)  + (-26)(-l  - 2x2)) 

■-24' 

8 

.-26 

pc  (l0  + 5x  + 15x2) 

pc  (5(1  + x + x2)  + 0(2  + 3x)  + (— 5)(— 1 - 2x2)) 

' 5 ' 

0 

_-5 

Pc  (17  + 4x  + 26x2) 

pc  ((— 8)(1  + x + x2)  + (4) (2  + 3x)  + ( — 17) ( — 1 - 2x2)) 

' -8" 

4 

-17 


So  the  matrix  representation  of  T (relative  to  B and  C)  is 


' 2 

-24 

5 

-8" 

0 

8 

0 

4 

-2 

-26 

-5 

-17 

We  know  from  Theorem  KNSI  that  the  kernel  of  the  linear  transformation  T is 
isomorphic  to  the  null  space  of  the  matrix  representation  Mg  c and  by  studying 
the  proof  of  Theorem  KNSI  we  learn  that  ps  is  an  isomorphism  between  these 


§MR 


Beezer:  A First  Course  in  Linear  Algebra 


530 


null  spaces.  Rather  than  trying  to  compute  the  kernel  of  T using  definitions  and 
techniques  from  Chapter  LT  we  will  instead  analyze  the  null  space  of  Mg  c using 
techniques  from  way  back  in  Chapter  V.  First  row- reduce  Mj^  c. 


'2  —24  5 

0 8 0 

—8 ' 
4 

RREF 
> 

S 

0 

0 

0 

5 

2 

0 

2" 

1 

|_-2  -26  -5 

-17j 

_ 0 

0 

0 

0_ 

So,  by  Theorem  BNS,  a basis  for  is 


f 

r_5n 

0 

'—2" 

z 

0 

1 

2 

l 

1 

5 

0 

l 

0 

. 1 . 

J 

Pg1  to  each  element  of  the  basis, 


We  can  now  convert  this  basis  of  J\T(Mg  c)  into  a basis  of  1C(T)  by  applying 


Pb 


/ 

- 5- 

\ 

2 

0 

' 1 2 ‘ 

+ 0 

' 1 3 ' 

+ 1 

'l  2 ' 

+ 0 

'2  5 

1 

-1  -1 

-1  -4 

0 -2 

-2  -4 

V 

0 

! 

-3 

1 

2 . 


Pb 


/ 

"—2" 

\ 

1 

= (-2) 

' 1 2 ' 

+ (~) 

' 1 3 ‘ 

+ 0 

1 2 ' 

+ 1 

'2  5 " 

2 

0 

-1  -1 

-1  -4 

0 -2 

-2  -4 

V 

. 1 . 

J 

So  the  set 


-3 

1 

2 J 


0 


is  a basis  for  JC(T).  Just  for  fun,  you  might  evaluate  T with  each  of  these  two  basis 
vectors  and  verify  that  the  output  is  the  zero  polynomial  (Exercise  MR. CIO).  A 

An  entirely  similar  result  applies  to  the  range  of  a linear  transformation  and  the 
column  space  of  a matrix  representation  of  the  linear  transformation. 


Theorem  RCSI  Range  and  Column  Space  Isomorphism 

Suppose  that  T:  U — )•  V is  a linear  transformation,  B is  a basis  for  U of  size  n,  and 
C is  a basis  for  V of  size  m.  Then  the  range  ofT  is  isomorphic  to  the  column  space 
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1Z(T)  = C (M^c) 


Proof.  To  establish  that  two  vector  spaces  are  isomorphic,  we  must  find  an  isomor- 
phism between  them,  an  invertible  linear  transformation  (Definition  IVS).  The  range 
of  the  linear  transformation  T,  1Z(T ),  is  a subspace  of  V , while  the  column  space  of 


as  a function  from  V to  Cm,  but  we  can  just  as  well  employ  the  definition  of  pc  as 
a function  from  1Z(T)  to  C(Mg  c). 

We  must  first  insure  that  if  we  choose  an  input  for  pc  from  1Z(T)  that  then  the 
output  will  be  an  element  of  C(Mg  c).  So  suppose  that  v £ 7 Z(T).  Then  there  is  a 
vector  u £ U,  such  that  T (u)  = v.  Consider 


This  says  that  pc  (v)  £ C(Mg  c),  as  desired. 

The  restriction  in  the  size  of  the  domain  and  codomain  will  not  affect  the  fact 
that  pc  is  a linear  transformation  (Theorem  VRLT),  nor  will  it  affect  the  fact  that 
pc  is  injective  (Theorem  VRI).  Something  must  be  done  though  to  verify  that  pc 
is  surjective.  This  all  gets  a bit  confusing,  since  the  domain  of  our  isomorphism 
is  the  range  of  the  linear  transformation,  so  think  about  your  objects  as  you  go. 
To  establish  that  pc  is  surjective,  appeal  to  the  definition  of  a surjective  linear 
transformation  (Definition  SLT),  and  suppose  that  we  have  an  element  of  the 
codomain,  y £ C(Mg  c)  C Cm  and  we  wish  to  find  an  element  of  the  domain  with 
y as  its  image.  Since  y £ C (Mg  c) , there  exists  a vector,  xeC"  with  Mg  cx  = y. 

We  now  show  that  the  desired  element  of  the  domain  is  v = pf}  (y).  First,  verify 
that  v £ 1Z(T)  by  applying  T to  u = pg1  (x), 


the  matrix  representation,  C(Mg  c)  is  a subspace  of  Cm.  The  function  pc  is  defined 


mb,cPb  (u)  = pc  ( T (u)) 
= Pc  (v) 


Theorem  FTMR 


Definition  RLT 


T (u)  = T (pg1  (x)) 

= Pc 1 ( mb,c  ( PB  ( Pb 1 (x)))) 
= Pc 1 ( mb,c  (7c-  (x))) 

= Pc 1 (MB,CX) 

= Pc 1 (y) 


Theorem  FTMR 


Definition  IVLT 


Definition  IDLT 


Definition  CSM 
Substitution 


Second,  verify  that  the  proposed  isomorphism,  pc , takes  v to  y, 


pc  (v)  = pc  ( pc 1 (y)) 
= icm  (y) 


= y 


Substitution 
Definition  IVLT 
Definition  IDLT 
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With  pc  demonstrated  to  be  an  injective  and  surjective  linear  transformation 
from  1Z(T)  to  C (Mg  c)  > Theorem  ILTIS  tells  us  pc  is  invertible,  and  so  by  Definition 
IVS,  we  say  1Z[T)  and  C(Mg  c)  are  isomorphic.  ■ 


Example  RVMR  Range  via  matrix  representation 

In  this  example,  we  will  recycle  the  linear  transformation  T and  the  bases  B and  C 
of  Example  KVMR  but  now  we  will  compute  the  range  of  T : M22  — > P2,  given  by 

a b 
c d 


J = (2  a — b + c — 5 d)  + (a  + 46  + 5c  + 2 d)x  + (3a  — 2 b + c — 8 d)x2 


With  bases  B and  C, 
B = 


1 2 
-1  -1 


1 3 

-1  -4 


1 2 
0 -2 


2 5 

-2  -4 


C = {l  + x + x2,  2 + 3x,  —1  — 2x2J 
we  obtain  the  matrix  representation 


Mb,c  = 


- 2 

-24 

5 

-8" 

0 

8 

0 

4 

-2 

-26 

-5 

-17 

We  know  from  Theorem  RCSI  that  the  range  of  the  linear  transformation  T is 
isomorphic  to  the  column  space  of  the  matrix  representation  Mg  c and  by  studying 
the  proof  of  Theorem  RCSI  we  learn  that  pc  is  an  isomorphism  between  these 
subspaces.  Notice  that  since  the  range  is  a subspace  of  the  codomain,  we  will 
employ  pc  as  the  isomorphism,  rather  than  pg,  which  was  the  correct  choice  for  an 
isomorphism  between  the  null  spaces  of  Example  KVMR. 

Rather  than  trying  to  compute  the  range  of  T using  definitions  and  techniques 
from  Chapter  LT  we  will  instead  analyze  the  column  space  of  Mg  c using  techniques 

from  way  back  in  Chapter  M.  First  row-reduce  (Mg  c)*, 


r 2 

0 

-21 

[ED 

0 

-1" 

-24 

8 

-26 

RREF 
)■ 

0 

0 

25 

4 

5 

0 

-5 

0 

0 

0 

L -8 

4 

-17j 

0 

0 

0 

Now  employ  Theorem  CSRST  and  Theorem  BRS  (there  are  other  methods  we 
could  choose  here  to  compute  the  column  space,  such  as  Theorem  BCS)  to  obtain 
the  basis  for  C ( Mg  C ) , 


r 

■ 1 ■ 

0 ■ 

) 

0 

1 

\ 

l 

-1 

1 

HS 

1 

\ 

We  can  now  convert  this  basis  of  C(Mg  c)  into  a basis  of  7 Z(T)  by  applying  p ^ 
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to  each  element  of  the  basis, 


Pc 1 


Pc 1 


( 


0 

1 

25 

4 


So  the  set 


^ = (1  + x + x 2)  — (—1  — 2x 2)  = 2 + x + 3a:2 

\ n . 25,  „ 33  „ 31  2 

J 4 4 2 

r n n 2 33  n 31  2 1 

< 2 + 3a:  + 3a;2,  — + 3a:  + —a;2  r 


is  a basis  for  TZ(T). 


A 


Theorem  KNSI  and  Theorem  RCSI  can  be  viewed  as  further  formal  evidence  for 
the  Coordinatization  Principle,  though  they  are  not  direct  consequences. 

Diagram  KRI  is  meant  to  suggest  Theorem  KNSI  and  Theorem  RCSI,  in  addition 
to  their  proofs  (and  so  carry  the  same  notation  as  the  statements  of  these  two 
theorems).  The  dashed  lines  indicate  a subspace  relationship,  with  the  smaller  vector 
space  lower  down  in  the  diagram.  The  central  square  is  highly  reminiscent  of  Diagram 
FTMR.  Each  of  the  four  vector  representations  is  an  isomorphism,  so  the  inverse 
linear  transformation  could  be  depicted  with  an  arrow  pointing  in  the  other  direction. 
The  four  vector  spaces  across  the  bottom  are  familiar  from  the  earliest  days  of  the 
course,  while  the  four  vector  spaces  across  the  top  are  completely  abstract.  The 
vector  representations  that  are  restrictions  (far  left  and  far  right)  are  the  functions 
shown  to  be  invertible  representations  as  the  key  technique  in  the  proofs  of  Theorem 
KNSI  and  Theorem  RCSI.  So  this  diagram  could  be  helpful  as  you  study  those  two 
proofs. 


/C(T) 

Pb\k\t) 

N i^B,c) 


n{T) 

PC\KT) 

C i^^B,c) 


Diagram  KRI:  Kernel  and  Range  Isomorphisms 
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Subsection  IVLT 

Invertible  Linear  Transformations 


We  have  seen,  both  in  theorems  and  in  examples,  that  questions  about  linear  trans- 
formations are  often  equivalent  to  questions  about  matrices.  It  is  the  matrix  repre- 
sentation of  a linear  transformation  that  makes  this  idea  precise.  Here  is  our  final 
theorem  that  solidifies  this  connection. 


Theorem  IMR  Invertible  Matrix  Representations 

Suppose  that  T : U — > V is  a linear  transformation,  B is  a basis  for  U and  C is  a 
basis  for  V . Then  T is  an  invertible  linear  transformation  if  and  only  if  the  matrix 
representation  of  T relative  to  B and  C , Mg  c is  an  invertible  matrix.  When  T is 
invertible, 

mi: b = «c)_1 


Proof.  (=>)  Suppose  T is  invertible,  so  the  inverse  linear  transformation  T 1 : V — > U 
exists  (Definition  IVLT).  Both  linear  transformations  have  matrix  representations 
relative  to  the  bases  of  U and  V,  namely  M|  c and  Mil  (Definition  MR). 

Then 


Ml~l  Ml  c 


And 

MlcMll 


Mb.b 

[ PB  ( Iu  (ul))l  PB  ( Iu  (u2))|  • • • | PB  ( Iu  W)] 
[pB  (ur)|  PB  (u2)|  PB  (u3)|  • ■ • I PB  (u„)] 
[ei|e2|e3| . . . |e„] 


Theorem  MRCLT 
Definition  IVLT 
Definition  MR 
Definition  IDLT 
Definition  VR 
Definition  IM 


[pc  {W  (vi))|  pc  (Iv  (v2))|  • • • | Pc  (Iv  (v„))] 
[pc  (vi)|  PC  (v2)|  PC  (v3)|  • ■ • I PC  (vn)] 
[ei|e2|e3| . . . |e„] 

In 


Theorem  MRCLT 
Definition  IVLT 
Definition  MR 
Definition  IDLT 
Definition  VR 
Definition  IM 


These  two  equations  show  that  Ml  c and  Ml  B are  inverse  matrices  (Definition 
MI)  and  establish  that  when  T is  invertible,  then  Ml  B = {Ml  c ) 1 . 

(4=)  Suppose  now  that  Ml  c is  an  invertible  matrix  and  hence  nonsingular 
(Theorem  NI).  We  compute  the  nullity  of  T, 


n(T)  = dim  (/C(T)) 


Definition  KLT 
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dim  (A f(Ml'C)) 

n (. Mlc ) 

0 


Theorem  KNSI 
Definition  NOM 
Theorem  RNNM 


So  the  kernel  of  T is  trivial,  and  by  Theorem  KILT,  T is  injective. 
We  now  compute  the  rank  of  T, 


r (T)  = dim  (ft  (T)) 

= dim  (C(M|  c)) 

= - Me) 

= dim  (V) 


Definition  RLT 
Theorem  RCSI 
Definition  ROM 
Theorem  RNNM 


Since  the  dimension  of  the  range  of  T equals  the  dimension  of  the  codomain  V, 
by  Theorem  EDYES,  ft(T)  = V.  Which  says  that  T is  surjective  by  Theorem  RSLT. 
Because  T is  both  injective  and  surjective,  by  Theorem  ILTIS,  T is  invertible. ■ 


By  now,  the  connections  between  matrices  and  linear  transformations  should 
be  starting  to  become  more  transparent,  and  you  may  have  already  recognized  the 
invertibility  of  a matrix  as  being  tantamount  to  the  invertibility  of  the  associated 
matrix  representation.  The  next  example  shows  how  to  apply  this  theorem  to 
the  problem  of  actually  building  a formula  for  the  inverse  of  an  invertible  linear 
transformation. 


Example  ILTVR  Inverse  of  a linear  transformation  via  a representation 
Consider  the  linear  transformation 


R:  P3  — > M2 2,  R (a  + bx  Y cx2  Y x3) 


a + b — c + 2d  2a  + 35  — 2c  + 3d 
a T b T 2d  — a Y b Y 2c  — 5 d 


If  we  wish  to  quickly  find  a formula  for  the  inverse  of  R (presuming  it  exists), 
then  choosing  “nice”  bases  will  work  best.  So  build  a matrix  representation  of  R 
relative  to  the  bases  B and  C , 


Then, 


B = {l,  x,  x2,  x3} 


C = 


0 

1 

0 

0 

1 

0 

0 

5 

1 

0 

0 0 
0 1 


Pc  (R(  1))  = Pc 
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pc  (R(x))  = pc 


T 

3 

1 

1 


Pc  (R  ( x 2))  = pc 


Pc  ( R(x 3))  = pc 


So  a representation  of  R is 


1 1 
2 3 

1 1 
-1  1 


-1 

-2 

0 

2 


2 ' 

3 

2 

-5 


The  matrix  MR  c is  invertible  (as  you  can  check)  so  we  know  for  sure  that  R is 
invertible  by  Theorem  IMR.  Furthermore, 


McTn 


= «c) 


■ 1 

1 

-1 

2 ' 

-1 

'20 

—7 

-2 

3 ' 

2 

3 

-2 

3 

-8 

3 

1 

-1 

1 

1 

0 

2 

-1 

0 

1 

0 

-1 

1 

2 

-5 

-6 

2 

1 

-1 

We  can  use  this  representation  of  the  inverse  linear  transformation,  in  concert 
with  Theorem  FTMR,  to  determine  an  explicit  formula  for  the  inverse  itself, 


R- 


a b 
c d 


= Pb 1 


M,r 


-1 

C,B  PC 


— Pb  ( (Mb,c)  Pc 


= Pb 1 


(MR 


c) 


Theorem  FTMR 
Theorem  IMR 

Definition  VR 


Definition  MI 
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/ 

'20a  — 7b  — 2c  + 3d 

\ 

—8a  + 36  + c — d 

—a  + c 

V 

_ —6a  + 26  + c — d _ 

/ 

= (20a  — 76  — 2c  + 3d)  + (—8a  + 36  + c — d)x 
+ (—a  + c)x2  + (—6a  + 26  + c — d)x3 


Definition  MVP 


Definition  VR 


A 


You  might  look  back  at  Example  AIVLT,  where  we  first  witnessed  the  inverse  of 
a linear  transformation  and  recognize  that  the  inverse  (S)  was  built  from  using  the 
method  of  Example  ILTVR  with  a matrix  representation  of  T. 

Theorem  IMILT  Invertible  Matrices,  Invertible  Linear  Transformation 
Suppose  that  A is  a square  matrix  of  size  n and  T : Cn  — ► Cn  is  the  linear  transfor- 
mation defined  by  T (x)  = Ax.  Then  A is  an  invertible  matrix  if  and  only  ifT  is  an 
invertible  linear  transformation. 


Proof.  Choose  bases  B = C = {ei,  e2,  e3,  . . . , e„}  consisting  of  the  standard  unit 
vectors  as  a basis  of  C"  (Theorem  SUVB)  and  build  a matrix  representation  of  T 
relative  to  B and  C.  Then 

Pc  ( T (ei))  = pc  (Aei) 

= Pc  (At) 

= A i 

So  then  the  matrix  representation  of  T,  relative  to  B and  C,  is  simply  Mg  c = A. 
With  this  observation,  the  proof  becomes  a specialization  of  Theorem  IMR, 

T is  invertible  Mg  c is  invertible  A is  invertible 


This  theorem  may  seem  gratuitous.  Why  state  such  a special  case  of  Theorem 
IMR?  Because  it  adds  another  condition  to  our  NMEx  series  of  theorems,  and  in 
some  ways  it  is  the  most  fundamental  expression  of  what  it  means  for  a matrix  to 
be  nonsingular  — the  associated  linear  transformation  is  invertible.  This  is  our  final 
update. 

Theorem  NME9  Nonsingular  Matrix  Equivalences,  Round  9 
Suppose  that  A is  a square  matrix  of  size  n.  The  following  are  equivalent. 

1.  A is  nonsingular. 

2.  A row-reduces  to  the  identity  matrix. 

3.  The  null  space  of  A contains  only  the  zero  vector,  AT  (A)  = {0}. 
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4-  The  linear  system  CS(A , b)  has  a unique  solution  for  every  possible  choice  of 

b. 

5.  The  columns  of  A are  a linearly  independent  set. 

6.  A is  invertible. 

1.  The  column  space  of  A is  C",  C(A)  = Cn. 

8.  The  columns  of  A are  a basis  for  Cn . 

9.  The  rank  of  A is  n,  r (A)  = n. 

10.  The  nullity  of  A is  zero,  n(A)  = 0. 

11.  The  determinant  of  A is  nonzero,  det  (A)  ^ 0. 

12.  A = 0 is  not  an  eigenvalue  of  A. 

13.  The  linear  transformation  T : Cn  — > C™  defined  by  T (x)  = Ax  is  invertible. 

Proof.  By  Theorem  IMILT,  the  new  addition  to  this  list  is  equivalent  to  the  statement 
that  A is  invertible,  so  we  can  expand  Theorem  NME8.  ■ 

Reading  Questions 


1.  Why  does  Theorem  FTMR  deserve  the  moniker  “fundamental”? 

2.  Find  the  matrix  representation,  Mg  C of  the  linear  transformation 


T:  C2  -»■  C2, 

relative  to  the  bases 


( 

Xi 

2xi  — x2 

X2 

J- 

3*i  + 2*2 

C = 


1 

1 


3.  What  is  the  second  “surprise,”  and  why  is  it  surprising? 


Exercises 

CIO  Example  KVMR  concludes  with  a basis  for  the  kernel  of  the  linear  transformation  T. 
Compute  the  value  of  T for  each  of  these  two  basis  vectors.  Did  you  get  what  you  expected? 
C20'  Compute  the  matrix  representation  of  T relative  to  the  bases  B and  C . 


T:  P3  — ► C3,  T (a  + bx  + cx2  + dx 3) 


2a  — 36  + 4c  — 2d 
a + b — c + d 
3a  + 2c  — 3 d 
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( 

"l" 

Y 

Y 

B = {l,  x,  x2,  x3} 

C = 

0 

1 

? 

1 

l 

0 

0 

1 

C2Y  Find  a matrix  representation  of  the  linear  transformation  T relative  to  the  bases  B 
and  C. 


T:P2^  C2,  T(p(x)) 


Pi  1) 
Pi  3) 


B = { 2 — 5x  + x2,  1 + x — x2,  a:2} 


C = 


2 

3 


C22'  Let  S22  be  the  vector  space  of  2 x 2 symmetric  matrices.  Build  the  matrix  repre- 
sentation of  the  linear  transformation  T : P2  — > S22  relative  to  the  bases  B and  C and  then 
use  this  matrix  representation  to  compute  T (3  + 5a:  — 2x2) . 


B = { 1,  1 + x,  l + x + x2} 


1 

o' 

'0  1 

i 

0 

0 

5 

1 0 

0 

0 


0 

1 


T (a  + bx  + cx 2) 


2a  — fe  + c a + 3b  — c 
a + 3fe  — c a — c 


C25^  Use  a matrix  representation  to  determine  if  the  linear  transformation  T : P3  — ¥ M22 
is  surjective. 


T (a  + bx  + cx 2 + dx3) 


—a  + 4b  + c + 2d  4a  — b + 6c  — d 
a + 5b  — 2c  + 2d  a + 2c  + 5d 


C30'  Find  bases  for  the  kernel  and  range  of  the  linear  transformation  S below. 


S : M22  — » P2,  S 


a b \ 

c d J 


(a  + 2b  + 5c  — 4 d)  + (3a  — b + 8c  + 2 d)x  + (a  + b + 4c  — 2 d)x2 


C40 ' Let  S22  be  the  set  of  2 x 2 symmetric  matrices.  Verify  that  the  linear  transformation 
R is  invertible  and  find  R~x . 


R ■ S22  — t P2,  R 


a 

b 


(a  — b)  + (2a  — 3b  — 2 c)x  + (a  — b + c)x2 


041^  Prove  that  the  linear  transformation  S is  invertible.  Then  find  a formula  for  the 
inverse  linear  transformation,  S'-1,  by  employing  a matrix  inverse. 

S:  Pi  — * M12,  S (a  + bx)  = [3 a + b 2 a + b\ 


C421  The  linear  transformation  R : M12  — > M2i  is  invertible.  Use  a matrix  representation 
to  determine  a formula  for  the  inverse  linear  transformation  R : M21  — > M 1 2 . 


R([a  b]) 


a + 3b 
4a  + life 


C50'  Use  a matrix  representation  to  find  a basis  for  the  range  of  the  linear  transformation 
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L. 


L : A/22  — y P2 , 


T 


a b \ 

c d J 


(a  + 2b  + 4c  + d)  + (3  a + c — 2 d)x  + (—a  + b + 3c  + 3d)*2 


C51  Use  a matrix  representation  to  find  a basis  for  the  kernel  of  the  linear  transformation 
L. 


L : M22  — y -P2,  T 


a b \ 

c d J 


(a  + 2b  + 4c  + d)  + (3  a + c — 2d)*  + (—a  + b + 3c  + 3d)*2 


C52'  Find  a basis  for  the  kernel  of  the  linear  transformation  T : P2  -A  M22. 


T (a  + bx  + c*2) 


a + 2b  — 2c  2a  + 2b 
— a + b — 4c  3a  + 2b  + 2c 


M20f  The  linear  transformation  D performs  differentiation  on  polynomials.  Use  a matrix 
representation  of  D to  find  the  rank  and  nullity  of  D. 

D : Pn  -y  Pn,  D (p(x))  = p'(x) 

M60  Suppose  U and  V are  vector  spaces  and  define  a function  Z : U —y  V by  T (u)  = Ov 
for  every  u € U.  Then  Exercise  IVLT.M60  asks  you  to  formulate  the  theorem:  Z is  invertible 
if  and  only  if  U = {Of/}  and  V = {Ov}.  What  would  a matrix  representation  of  Z look  like 
in  this  case?  How  does  Theorem  IMR  read  in  this  case? 

M80  In  light  of  Theorem  KNSI  and  Theorem  MRCLT,  write  a short  comparison  of 
Exercise  MM.T40  with  Exercise  ILT.T15. 

M81  In  light  of  Theorem  RCSI  and  Theorem  MRCLT,  write  a short  comparison  of 
Exercise  CRS.T40  with  Exercise  SLT.T15. 

M82  In  light  of  Theorem  MRCLT  and  Theorem  IMR,  write  a short  comparison  of 
Theorem  SS  and  Theorem  ICLT. 

M83  In  light  of  Theorem  MRCLT  and  Theorem  IMR,  write  a short  comparison  of 
Theorem  NPNT  and  Exercise  IVLT.T40. 

T20f  Construct  a new  solution  to  Exercise  B.T50  along  the  following  outline.  From 
the  n x n matrix  A , construct  the  linear  transformation  T : Cn  —y  Cn,  T (x)  = Ax.  Use 
Theorem  NI,  Theorem  IMILT  and  Theorem  ILTIS  to  translate  between  the  nonsingularity 
of  A and  the  surjectivity/injectivity  of  T.  Then  apply  Theorem  ILTB  and  Theorem  SLTB 
to  connect  these  properties  with  bases. 

T40  Theorem  VSLT  defines  the  vector  space  CT  (17,  V)  containing  all  linear  transforma- 
tions with  domain  U and  codomain  V.  Suppose  dim  (U)  = n and  dim  (V)  = m.  Prove  that 
CT  (U,  V ) is  isomorphic  to  Mmn,  the  vector  space  of  all  m x n matrices  (Example  VSM). 
(Hint:  we  could  have  suggested  this  exercise  in  Chapter  LT,  but  have  postponed  it  to  this 
section.  Why?) 

T41  Theorem  VSLT  defines  the  vector  space  CT  (U,  V ) containing  all  linear  transfor- 
mations with  domain  U and  codomain  V.  Determine  a basis  for  CT  (U,  V).  (Hint:  study 
Exercise  MR.T40  first.) 
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T60  Create  an  entirely  different  proof  of  Theorem  IMILT  that  relies  on  Definition  IVLT  to 
establish  the  invertibility  of  T,  and  that  relies  on  Definition  MI  to  establish  the  invertibility 
of  A. 

T8(F  Suppose  that  T:  U — > V and  S:  V — > W are  linear  transformations,  and  that  B,  C 
and  D are  bases  for  U,  V,  and  W.  Using  only  Definition  MR  define  matrix  representations 
for  T and  S.  Using  these  two  definitions,  and  Definition  MR,  derive  a matrix  representation 
for  the  composition  S o T in  terms  of  the  entries  of  the  matrices  Mg  c and  Mq  d . Explain 
how  you  would  use  this  result  to  motivate  a definition  for  matrix  multiplication  that  is 
strikingly  similar  to  Theorem  EMP. 


Section  CB 
Change  of  Basis 

We  have  seen  in  Section  MR  that  a linear  transformation  can  be  represented  by 
a matrix,  once  we  pick  bases  for  the  domain  and  codomain.  How  does  the  matrix 
representation  change  if  we  choose  different  bases?  Which  bases  lead  to  especially 
nice  representations?  From  the  infinite  possibilities,  what  is  the  best  possible  repre- 
sentation? This  section  will  begin  to  answer  these  questions.  But  first  we  need  to 
define  eigenvalues  for  linear  transformations  and  the  change-of-basis  matrix. 


Subsection  EELT 

Eigenvalues  and  Eigenvectors  of  Linear  Transformations 


We  now  define  the  notion  of  an  eigenvalue  and  eigenvector  of  a linear  transforma- 
tion. It  should  not  be  too  surprising,  especially  if  you  remind  yourself  of  the  close 
relationship  between  matrices  and  linear  transformations. 


Definition  EELT  Eigenvalue  and  Eigenvector  of  a Linear  Transformation 
Suppose  that  T : V — > V is  a linear  transformation.  Then  a nonzero  vector  v £ V is 
an  eigenvector  of  T for  the  eigenvalue  A if  T (v)  = Av.  □ 

We  will  see  shortly  the  best  method  for  computing  the  eigenvalues  and  eigenvectors 
of  a linear  transformation,  but  for  now,  here  are  some  examples  to  verify  that  such 
things  really  do  exist. 


Example  ELTBM  Eigenvectors  of  linear  transformation  between  matrices 
Consider  the  linear  transformation  T : M22  — t M22  defined  by 


T 


-17a  + 116  + 8c-  lid  -57a  + 356  + 24c  - 33d 
-14a  + 106  + 6c-  lOd  -41a  + 256  + 16c  - 23d 


and  the  vectors 


0 

1 

'l 

l' 

'l 

3' 

'2 

6' 

Xl  = 

0 

1 

x2  = 

1 

0 

x3  = 

2 

3 

x4  = 

1 

4 

Then  compute 


T (Xl)  = T 
T(x2)  =T 
T(x3)  = T 
T(x4)=T 


0 

0 

1 

1 

'1 

2 

2 

1 


0 

0 

2 

2 

-1 

-2 

-4 

-2 


2 

2 

2 

0 


= 2xi 

= 2x2 


-3 

-3 


(-l)x3 


= (— 2)x4 


542 
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So  xi,  X2,  X3,  X4  are  eigenvectors  of  T with  eigenvalues  (respectively)  Ai  = 2, 
A2  = 2,  A3  = —1,  A4  = —2.  A 

Here  is  another. 

Example  ELTBP  Eigenvectors  of  linear  transformation  between  polynomials 
Consider  the  linear  transformation  R : P2  — > P2  defined  by 

R(ci  + bx  + ex2)  = (15a  + 8 b — 4c)  + (—12a  — 6 b + 3 c)x  + (24  a + 14&  — 7 c)x2 

and  the  vectors 

wi  = 1 — x + x2  w2  = x + 2x2  w3  = 1 + 4x2 

Then  compute 

R (wi)  = R (l  — x + x2)  = 3 — 3x  + 3x2  = 3wi 
R (w2)  = R (x  + 2x2)  = 0 + Ox  + Ox2  = 0w2 
R (w3)  = R (l  + 4x2)  = —1  — 4x2  = (— l)w3 


So  wi,  w2,  W3  are  eigenvectors  of  R with  eigenvalues  (respectively)  Ai  = 3, 
A2  = 0,  A3  = — 1.  Notice  how  the  eigenvalue  A2  = 0 indicates  that  the  eigenvector  w2 
is  a nontrivial  element  of  the  kernel  of  R,  and  therefore  R is  not  injective  (Exercise 
CB.T15).  A 

Of  course,  these  examples  are  meant  only  to  illustrate  the  definition  of  eigenvectors 
and  eigenvalues  for  linear  transformations,  and  therefore  beg  the  question,  “How 
would  I find  eigenvectors?”  We  will  have  an  answer  before  we  finish  this  section.  We 
need  one  more  construction  first. 

Subsection  CBM 
Change-of-Basis  Matrix 

Given  a vector  space,  we  know  we  can  usually  find  many  different  bases  for  the 
vector  space,  some  nice,  some  nasty.  If  we  choose  a single  vector  from  this  vector 
space,  we  can  build  many  different  representations  of  the  vector  by  constructing  the 
representations  relative  to  different  bases.  How  are  these  different  representations 
related  to  each  other?  A change-of-basis  matrix  answers  this  question. 

Definition  CBM  Change-of-Basis  Matrix 

Suppose  that  V is  a vector  space,  and  ly  : V — > V is  the  identity  linear  transformation 
on  V.  Let  B = {vi,  v2,  v3,  . . . , vn}  and  C be  two  bases  of  V.  Then  the  change- 
of-basis  matrix  from  B to  C is  the  matrix  representation  of  ly  relative  to  B and 
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C, 

Cb,c  = Mb'c 

= [PC  (Iv  (Vl))|  PC  (Iv  (v2))|  pc  (Iv  (V3))|  • • ■ | PC  (Iv  (v„))] 

= [pc  (Vl)|  Pc  (v2)|  Pc  (v3)|  • • • | Pc  (v„)] 

□ 

Notice  that  this  definition  is  primarily  about  a single  vector  space  (V)  and  two 
bases  of  V ( B , C).  The  linear  transformation  (Iy)  is  necessary  but  not  critical.  As 
you  might  expect,  this  matrix  has  something  to  do  with  changing  bases.  Here  is  the 
theorem  that  gives  the  matrix  its  name  (not  the  other  way  around). 

Theorem  CB  Change-of-Basis 

Suppose  that  v is  a vector  in  the  vector  space  V and  B and  C are  bases  of  V . Then 

Pc  (v)  = Cb,cPb  (v) 


Proof. 


PC  (v)  = pc  (Iv  (v)) 
= mb,cPb  (v) 
= Cb,cPb  (v) 


Definition  IDLT 
Theorem  FTMR 
Definition  CBM 


So  the  change-of-basis  matrix  can  be  used  with  matrix  multiplication  to  convert  a 
vector  representation  of  a vector  (v)  relative  to  one  basis  (pb  (v))  to  a representation 
of  the  same  vector  relative  to  a second  basis  (pc  (v)). 

Theorem  ICBM  Inverse  of  Change-of-Basis  Matrix 

Suppose  that  V is  a vector  space,  and  B and  C are  bases  ofV.  Then  the  change-of- 
basis  matrix  Cb,c  is  nonsingular  and 

Cb]c  = Ccm 


Proof.  The  linear  transformation  Iy  : V — > V is  invertible,  and  its  inverse  is  itself,  Iy 
(check  this!).  So  by  Theorem  IMR,  the  matrix  = Cb,c  is  invertible.  Theorem 

NI  says  an  invertible  matrix  is  nonsingular. 

Then 


= Cc,b 


Definition  CBM 

Theorem  IMR 
Definition  IDLT 
Definition  CBM 
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Example  CBP  Change  of  basis  with  polynomials 

The  vector  space  P4  (Example  VSP)  has  two  nice  bases  (Example  BP), 

B = |l,a;,a:2,a:3,a;4} 

C = {l,  1 + x,  1 + x + x2, 1 + x + x2  + x3, 1 + x + x2  + x3  + a;4} 

To  build  the  change-of-basis  matrix  between  B and  C,  we  must  first  build  a 
vector  representation  of  each  vector  in  B relative  to  C, 


PC  (1)  = PC  ((1)  (1))  = 


pc  (z)  = Pc  ((-1)  (1)  + (1)  (1  + aO)  = 


-1' 

1 

0 

0 

0 


Pc  (x2)  = pc  ((-1)  (1  + x)  + (1)  (l  + x + x2))  = 


' 0 ' 
-1 
1 
0 
0 


pc  (x3)  = pc  ((-1)  (l  + x + x 2)  + (1)  (l  + x + x2  + x3))  = 


' 0 ' 

0 

-1 

1 

0 


Pc  (x4)  = pc  ((-1)  (1  + x + x2  + x3)  + (1)  (1  + x + x2  + x3  + a:4))  = 
Then  we  package  up  these  vectors  as  the  columns  of  a matrix, 


' 0 ' 
0 
0 

-1 

1 


Cb,c 


T -1  0 0 0 " 

01-10  0 
00  1-10 
0 0 0 1 -1 

0 0 0 0 1 
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Now,  to  illustrate  Theorem  CB,  consider  the  vector  u = 5 — 3a:  + 2x 2 + 8x3  — 3a;4. 
We  can  build  the  representation  of  u relative  to  B easily, 


Pb  (u)  = pB  (5  — 3x  + 2x2  + 8a;3  — 3a;4) 


' 5 ' 
-3 
2 


8 


-3 


Applying  Theorem  CB,  we  obtain  a second  representation  of  u,  but  now  relative 
to  C, 


PC  (u) 


Cb,cPb  (u) 


1 

-1 

0 

0 

0 ■ 

' 5 ' 

0 

1 

-1 

0 

0 

-3 

0 

0 

1 

-1 

0 

2 

0 

0 

0 

1 

-1 

8 

0 

0 

0 

0 

1 

-3 

' 8 ' 

-5 

-6 

11 

-3 


Theorem  CB 


Definition  MVP 


We  can  check  our  work  by  unraveling  this  second  representation, 
u = Pq1  (pc  (u))  Definition  IVLT 


( 

' 8 ' 

-5 

-6 

11 

\ 

V 

-3 

/ 

= 8(1)  + (— 5)(1  + x)  + (— 6)(1  + x + x2) 

+ (11) (1  + x + x2  + x3)  + (— 3)(1  + x + x2  + x3  + x4)  Definition  VR 
= 5 - 3a;  + 2a;2  + 8a;3  - 3a;4 

The  change-of-basis  matrix  from  C to  B is  actually  easier  to  build.  Grab  each 
vector  in  the  basis  C and  form  its  representation  relative  to  B 


Pb  (1)  = Pb  ((1)1) 


T 

0 

0 

0 

0 
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Pb  (1  + x)  = pB  ((1)1  + (l)x)  = 


pB  (1  + x + x 2)  = pB  ((1)1  + (l)z  + (l)a-’2)  = 


pB  (l  + x + x2  + x3 ) = pB  ((1)1  + (l)a;  + (l)x2  + (l)z3)  = 


Pb  (l  + x + x2  + x3  + x 4)  = ps  ((1)1  + (l)a;  + (l)a;2  + (l)a;3  + (l)*4) 


1' 

1 

1 

1 

1 


Then  we  package  up  these  vectors  as  the  columns  of  a matrix, 


Cc,b 


Tiiir 
0 1111 
0 0 111 
0 0 0 1 1 
0 0 0 0 1 


We  formed  two  representations  of  the  vector  u above,  so  we  can  again  provide  a 
check  on  our  computations  by  converting  from  the  representation  of  u relative  to  C 
to  the  representation  of  u relative  to  B , 


Pb  (u)  = Cc,bPc  (u) 

riiii 


1' 


0 1111 
0 0 111 
0 0 0 1 1 
0 0 0 0 1 

' 5 ' 

-3 
2 


' 8 ' 

-5 

-6 

11 

-3 

Theorem  CB 


Definition  MVP 


-3 
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One  more  computation  that  is  either  a check  on  our  work,  or  an  illustration  of  a 
theorem.  The  two  change-of-basis  matrices,  Cb,c  and  Cc,b , should  be  inverses  of 
each  other,  according  to  Theorem  ICBM.  Here  we  go, 


Cb,cCc,b  = 


T 

-1 

0 

0 

0 ■ 

T 

1 

1 

1 

r 

T 

0 

0 

0 

O' 

0 

1 

-1 

0 

0 

0 

1 

1 

1 

1 

0 

1 

0 

0 

0 

0 

0 

1 

-1 

0 

0 

0 

1 

1 

1 

= 

0 

0 

1 

0 

0 

0 

0 

0 

1 

-1 

0 

0 

0 

1 

1 

0 

0 

0 

1 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

1 

0 

0 

0 

0 

1 

A 


The  computations  of  the  previous  example  are  not  meant  to  present  any  labor- 
saving  devices,  but  instead  are  meant  to  illustrate  the  utility  of  the  change-of-basis 
matrix.  However,  you  might  have  noticed  that  Cc,b  was  easier  to  compute  than 
Cb,c-  If  you  needed  Cb,c > then  you  could  first  compute  Cc,b  and  then  compute  its 
inverse,  which  by  Theorem  ICBM,  would  equal  Cb,c- 

Here  is  another  illustrative  example.  We  have  been  concentrating  on  working 
with  abstract  vector  spaces,  but  all  of  our  theorems  and  techniques  apply  just  as  well 
to  Cm,  the  vector  space  of  column  vectors.  We  only  need  to  use  more  complicated 
bases  than  the  standard  unit  vectors  (Theorem  SUVB)  to  make  things  interesting. 


Example  CBCV  Change  of  basis  with  column  vectors 
For  the  vector  space  C4  we  have  the  two  bases, 


B = 


The  change-of-basis  matrix  from  B to  C requires  writing  each  vector  of  I?  as  a 
linear  combination  the  vectors  in  C . 


= Pc  (1) 


= Pc  (2) 


= PC  (1) 


' 1 ' 
-6 
-4 
-1 

' 1 ' 
-6 
-4 
-1 


' 1 ‘ 
-6 
-4 
-1 


+ (-2) 


(-3) 


+ (—3) 


—4' 

8 

-5 
. 8 . 

—4' 

8 

-5 

8 . 

—4' 

8 

-5 


+ (1) 


+ (3) 


+ (1) 


-5' 
13 
-2 
. 9 . 

-5' 
13 
-2 
. 9 . 

-5‘ 

13 

-2 

9 


+ (-l) 


■ 3 ' 

\ 

r 1 1 

—7 

1 

tO 

3 

- 

1 

-6 

) 

-1. 

(0) 


3 ‘ 
-7 
3 

-6 


(-2) 


2 ' 
-3 
3 
0 


■ 3 ' 

\ 

r 1 1 

—7 

1 

CO 

3 

. 

- 

1 

-6 

) 

— 2_ 
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Pc 


= Pc 


■ 1 ■ 

-6 

-4 

+ (-2) 

'-4' 

8 

-5 

+ (4) 

-5' 

13 

-2 

+ (3) 

' 3 ’ 

-7 

3 

-1 

8 

9 

-6 

Then  we  package  these  vectors  up  as  the  change-of-basis  matrix, 

12  12 


Cb,c  = 


-2  -3  -3  -2 
13  14 

-10-23 


Now  consider  a single  (arbitrary)  vector  y = 


2 ' 

6 

-3 

4 


. First,  build  the  vector 


representation  of  y relative  to  B.  This  will  require  writing  y as  a linear  combination 
of  the  vectors  in  B , 


Pb  (y)  = Pb 


= Pb 


/ r 2 

6 

-3 

V L 4 j 

/ 

(-21) 


V 


' 1 
-2 
1 

-2 


’— 1" 

' 2 ' 

-r 

\ 

'—21' 

+ (6) 

3 

1 

+ (H) 

-3 

3 

+ (—7) 

3 

3 

- 

6 

11 

1 

-4 

0 

/ 

-7 

Now,  applying  Theorem  CB  we  can  convert  the  representation  of  y relative  to  B 
into  a representation  relative  to  C, 


pc  (y)  = cb,cpb  (y) 


’ 1 

2 1 2 ’ 

-2 

-3  -3  -2 

1 

3 14 

-1 

0-2  3 

’—12' 

5 

-20 

-22 

Theorem  CB 

’—21" 

6 

11 

-7 


Definition  MVP 


We  could  continue  further  with  this  example,  perhaps  by  computing  the  rep- 
resentation of  y relative  to  the  basis  C directly  as  a check  on  our  work  (Exercise 
CB.C20).  Or  we  could  choose  another  vector  to  play  the  role  of  y and  compute  two 
different  representations  of  this  vector  relative  to  the  two  bases  B and  C.  A 
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Subsection  MRS 

Matrix  Representations  and  Similarity 

Here  is  the  main  theorem  of  this  section.  It  looks  a bit  involved  at  first  glance,  but 
the  proof  should  make  you  realize  it  is  not  all  that  complicated.  In  any  event,  we 
are  more  interested  in  a special  case. 

Theorem  MRCB  Matrix  Representation  and  Change  of  Basis 

Suppose  that  T:  U -A  V is  a linear  transformation , B and  C are  bases  for  U , and 

D and  E are  bases  for  V.  Then 

M bd  = C E,D^C,E^ B,C 


Proof. 

Ce,dMqE^b,c  = M^dMIeM1^ 
= m^dmtb%- 

— ^E.D^B.E 

=<:°j 

= MlD 


Definition  CBM 
Theorem  MRCLT 
Definition  IDLT 
Theorem  MRCLT 
Definition  IDLT 


We  will  be  most  interested  in  a special  case  of  this  theorem  (Theorem  SCB),  but 
here  is  an  example  that  illustrates  the  full  generality  of  Theorem  MRCB. 

Example  MH.CM  Matrix  representations  and  change-of-basis  matrices 
Begin  with  two  vector  spaces,  S2,  the  subspace  of  M22  containing  all  2 x 2 symmetric 
matrices,  and  P3  (Example  VSP),  the  vector  space  of  all  polynomials  of  degree  3 or 
less.  Then  define  the  linear  transformation  Q : P2  —5 ► P3  by 


Q 


a b 
b c 


= (5  a — 2b  + 6c)  + (3a  — b + 2 c)x  + (a  + 36  — c)x 2 + (—4a  + 2 b + c)  a 


Here  are  two  bases  for  each  vector  space,  one  nice,  one  nasty.  First  for  S2, 


B = 


5 -3 

-3  -2 


2 -3 

-3  0 


1 2 
2 4 


C = 


1 0 
0 0 


0 1 
1 0 


0 0 
0 1 


and  then  for  P3, 

D = {2  + x — 2x 2 + 3x3,  —1  — 2a;2  + 3a;3,  — 3 — x + x3,  —x2  + a;3} 

E = {l,  x,  x2,  a:3} 

We  will  begin  with  a matrix  representation  of  Q relative  to  C and  E.  We  first 
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find  vector  representations  of  the  elements  of  C relative  to  E, 

r 5 ' 

Pe  ( 0 o ) ) = PE  (5  + 3x  + x2  ~ 4a;3)  = 1 

4_ 

2" 

^ ( 1 0 ) ) = PE  (_2  ~ x + 3a;2  + 2x 3)  = 31 

L 2 . 

6 " 

Pe  (q  ( g ^ ^ = PE  (6  + 2x  - x2  + X3)  = 

L i . 


So 

5 —2  6 ' 

3—12 
1 3 -1 

-4  2 1 _ 

Now  we  construct  two  change-of-basis  matrices.  First,  Cb,c  requires  vector 
representations  of  the  elements  of  B , relative  to  C.  Since  C is  a nice  basis,  this  is 
straight  forward , 

«([i3  !!]+(-»)[?  i]+(-»[S  ?])  - [:V 

pc  ( -3  03  ) = pc  ((2)  0 0+  1 J +(°)  S 1 ) = “3 

pc  ( 2 4 ) = pc  ((1)  0 0 + (2)  1 0 + (4)  0 1 ) = 4 

So 

5 2 r 

-3  -3  2 
-2  0 4 

The  other  change-of-basis  matrix  we  will  compute  is  Ce,d-  However,  since  E is 
a nice  basis  (and  D is  not)  we  will  turn  it  around  and  instead  compute  Cd,e  and 
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apply  Theorem  ICBM  to  use  an  inverse  to  compute  Ce,d- 


Pe{  2 + x — 2x2  + 3x3) 


Pe  (-1  - 2x2  + 3x 3) 


Pe  (-3  - x + x3) 


pE  (-x2  +x3) 


Pe  ((2)1  + (1)®  + (—2)x2  + (3)x3)  - 
pE  ((-1)1  + (0)x  + (—2)x2  + (3)x3) 
pE  ((-3)1  + (-l)a;  + ( 0)x 2 + (l)x3) 
Pe  ((0)1  + (0)x  + (-l)a:2  + (l)®3)  = 


' 2 ' 
1 

-2 

3 


-r 

0 

-2 
3 . 

3" 
— 1 
0 

1 


' 0 ' 
0 

-1 

1 


So,  we  can  package  these  column  vectors  up  as  a matrix  to  obtain  Cd,e  and 
then, 


Ce,d  = (Cd,e 

r1 

‘ 2 

-i 

-3 

0 ‘ 

1 

0 

-1 

0 

—2 

-2 

0 

-1 

_ 3 

3 

1 

1 _ 

‘ 1 

-2 

1 

1 ■ 

—2 

5 

-1 

-1 

1 

-3 

1 

1 

2 

-6 

-1 

0 

Theorem  ICBM 


We  are  now  in  a position  to  apply  Theorem  MRCB.  The  matrix  representation 
of  Q relative  to  B and  D can  be  obtained  as  follows, 


1v1b,d  — 


Ce,dM^e 

Cb,c 

‘ 1 

-2 

1 

1 ‘ 

-2 

5 

-1 

-1 

1 

-3 

1 

1 

_ 2 

-6 

-1 

0 . 

‘ 1 

-2 

1 

1 ‘ 

-2 

5 

-1 

-1 

1 

-3 

1 

1 

2 

-6 

-1 

0 

' 5 

-2 

6 ' 

3 

-1 

2 

1 

3 

-1 

.-4 

2 

1 

‘ 19 

16 

25“ 

14 

9 

9 

-2 

-7 

3 

-28 

-14 

4 

2 r 

-3  2 
0 4 


Theorem  MRCB 
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'-39 

-23 

14  ' 

62 

34 

-12 

-53 

-32 

5 

-44 

-15 

-7 

Now  check  our  work  by  computing  MB  D directly  (Exercise  CB.C21).  A 

Here  is  a special  case  of  the  previous  theorem,  where  we  choose  U and  V to 
be  the  same  vector  space,  so  the  matrix  representations  and  the  change-of-basis 
matrices  are  all  square  of  the  same  size. 

Theorem  SCB  Similarity  and  Change  of  Basis 

Suppose  that  T : V — > V is  a linear  transformation  and  B and  C are  bases  of  V. 
Then 

M%tB  = Cn)cMj,cCnpj 


Proof.  In  the  conclusion  of  Theorem  MRCB,  replace  D by  B , and  replace  E by  C, 
Ml  B = Cc.bMqCCb,c  Theorem  MRCB 

= Cb1cMqCCb,c  Theorem  ICBM 


This  is  the  third  surprise  of  this  chapter.  Theorem  SCB  considers  the  special  case 
where  a linear  transformation  has  the  same  vector  space  for  the  domain  and  codomain 
(V).  We  build  a matrix  representation  of  T using  the  basis  B simultaneously  for  both 
the  domain  and  codomain  (MB  B),  and  then  we  build  a second  matrix  representation 
of  T,  now  using  the  basis  C for  both  the  domain  and  codomain  (Mq  c).  Then  these 
two  representations  are  related  via  a similarity  transformation  (Definition  SIM) 
using  a change-of-basis  matrix  (Cb,cV- 


Example  MRBE  Matrix  representation  with  basis  of  eigenvectors 

We  return  to  the  linear  transformation  T : M22  — > M2 2 of  Example  ELTBM  defined 

by 


T 


-17a  + 116  + 8c-  lid  -57a  + 356  + 24c  - 33d 
-14a  + 106  + 6c-  lOd  -41a  + 256  + 16c  - 23d 


In  Example  ELTBM  we  showcased  four  eigenvectors  of  T . We  will  now  put  these 
four  vectors  in  a set, 

n r iff0  !]  f1  f]  f1  3l  [2  6l\ 

B - |Xi,  X2,  X3,  X4j  — | |q  lj  ’ [l  oj  ’ [2  3J  ’ [1  4J  J 

Check  that  B is  a basis  of  M2 2 by  first  establishing  the  linear  independence  of 
B and  then  employing  Theorem  G to  get  the  spanning  property  easily.  Here  is  a 


0 1 

1 1 

1 3 

2 6 

0 1 

5 

1 0 

•) 

2 3 

5 

1 4 
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second  set  of  2 x 2 matrices,  which  also  forms  a basis  of  M22  (Example  BM), 

C = {yi,  y2,  y3,  y4}  = 


1 0 
0 0 


0 1 
0 0 


0 0 
1 0 


0 0 
0 1 


We  can  build  two  matrix  representations  of  T,  one  relative  to  B and  one  relative 
to  C . Each  is  easy,  but  for  wildly  different  reasons.  In  our  computation  of  the  matrix 
representation  relative  to  B we  borrow  some  of  our  work  in  Example  ELTBM.  Here 
are  the  representations,  then  the  explanation. 


Pb  (T  (xi))  = pB  (2xi)  = pB  (2xi  + 0x2  + 0x3  + 0x4)  = 


Pb  (T  (x2 ) ) = pB  (2x2)  = pb  (0x4  + 2x2  + 0x3  + 0x4)  = 


Pb  ( T (x3))  = pB  ((-1)x3)  = pB  (0x4  + 0x2  + (-l)x3  + 0x4)  = 

pB  ( T (x4))  = pB  ((-2)x4)  = pB  (0x4  + 0x2  + 0x3  + (-2)x4)  = 
So  the  resulting  representation  is 


mZb  = 


'2  0 0 O' 

0 2 0 0 
0 0-10 
0 0 0 -2 


Very  pretty. 

Now  for  the  matrix  representation  relative  to  C first  compute, 


PC  ( T (yi))  = pc 
= PC  ((-17) 
pc  ( T (y2))  = pc 


1 0 
0 0 


-17  -57 
-14  -41 


(-57) 


11  35 
10  25 


0 1 
0 0 


(-14) 


0 0 
1 0 


(-41) 


0 0 
0 1 


■ 0 ' 
0 

-1 
. 0 . 

■ 0 ' 
0 
0 

-2 


-171 

-57 

-14 

-41 
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= Pc  11 


1 0 
0 0 


+ 35 


0 1 
0 0 


+ 10 


0 0 
1 0 


+ 25 


0 0 
0 1 


Pc  ( T (y3))  = pc 


'8  24]  \ 
6 16| ) 


'll' 

35 

10 

25 


= Pc  8 


1 0 
0 0 


Pc  ( T (y4))  = pc 


24 


0 1 
0 0 


0 0 
1 0 


-11  -33 
-10  -23 


16 


0 0 
0 1 


= Pc  (-H) 


1 0 
0 0 


(-33) 


0 1 
0 0 


(-10) 


0 0 
1 0 


(-23) 


0 0 
0 1 


'-11' 

-33 

-10 

-23 


So  the  resulting  representation  is 


'—17 

11 

8 

-11" 

-57 

35 

24 

-33 

-14 

10 

6 

-10 

-41 

25 

16 

-23 

Not  quite  as  pretty. 

The  purpose  of  this  example  is  to  illustrate  Theorem  SCB.  This  theorem  says  that 
the  two  matrix  representations,  MBB  and  of  the  one  linear  transformation, 

T,  are  related  by  a similarity  transformation  using  the  change-of-basis  matrix  Cb,c- 
Let  us  compute  this  change-of-basis  matrix.  Notice  that  since  C is  such  a nice  basis, 
this  is  fairly  straightforward, 


PC  (xi)  = pc 


0 1 
0 1 


= PC  0 


1 0 
0 0 


0 1 
0 0 


0 0 
1 0 


+ 1 


0 0 
0 1 


Pc  (x2)  = Pc 


1 1 
1 0 


= Pc  ( 1 


1 0 
0 0 


0 1 
0 0 


0 0 
1 0 


+ 0 


0 0 
0 1 


1 3 

2 3 


= Pc  1 


1 0 
0 0 


0 1 
0 0 


+ 2 


0 0 
1 0 


+ 3 


0 0 
0 1 


'O' 

1 

0 

_1_ 

T 

1 

1 

.0. 

T 

3 

2 

3 


PC  (x3)  = Pc 
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PC  (X4)  = pc 


So  we  have, 


2 6 
1 4 


= Pc 


1 0 
0 0 


0 1 
0 0 


+ 1 


0 0 
1 0 


+ 4 


0 0 
0 1 


Cb,c  = 


'0  1 1 
1 1 3 
0 12 
1 0 3 


Now,  according  to  Theorem  SCB  we  can  write, 
Mb,b  = C b ]c  Ml  C Cb.g 


2 

0 

0 

0 ' 

‘0 

1 

1 

2" 

-1 

-17 

11 

8 

-IF 

‘0 

1 

1 

2" 

0 

2 

0 

0 

1 

1 

3 

6 

-57 

35 

24 

-33 

1 

1 

3 

6 

0 

0 

-1 

0 

0 

1 

2 

1 

-14 

10 

6 

-10 

0 

1 

2 

1 

0 

0 

0 

-2 

1 

0 

3 

4 

-41 

25 

16 

-23 

1 

0 

3 

4 

This  should  look  and  feel  exactly  like  the  process  for  diagonalizing  a matrix,  as 
was  described  in  Section  SD.  And  it  is.  A 


We  can  now  return  to  the  question  of  computing  an  eigenvalue  or  eigenvector 
of  a linear  transformation.  For  a linear  transformation  of  the  form  T : V — > V,  we 
know  that  representations  relative  to  different  bases  are  similar  matrices.  We  also 
know  that  similar  matrices  have  equal  characteristic  polynomials  by  Theorem  SMEE. 
We  will  now  show  that  eigenvalues  of  a linear  transformation  T are  precisely  the 
eigenvalues  of  any  matrix  representation  of  T.  Since  the  choice  of  a different  matrix 
representation  leads  to  a similar  matrix,  there  will  be  no  “new”  eigenvalues  obtained 
from  this  second  representation.  Similarly,  the  change-of-basis  matrix  can  be  used 
to  show  that  eigenvectors  obtained  from  one  matrix  representation  will  be  precisely 
those  obtained  from  any  other  representation.  So  we  can  determine  the  eigenvalues 
and  eigenvectors  of  a linear  transformation  by  forming  one  matrix  representation, 
using  any  basis  we  please,  and  analyzing  the  matrix  in  the  manner  of  Chapter  E. 

Theorem  EER  Eigenvalues,  Eigenvectors,  Representations 
Suppose  that  T : V — > V is  a linear  transformation  and  B is  a basis  ofV . Then  v £ V 
is  an  eigenvector  of  T for  the  eigenvalue  A if  and  only  if  Pb  (v)  is  an  eigenvector  of 
Mg  b for  the  eigenvalue  A. 

Proof.  (=>)  Assume  that  v £ V is  an  eigenvector  of  T for  the  eigenvalue  A.  Then 
Mb.bPb  (v)  = pB  {T  (v)) 

= Pb  (Av) 

= A pB  (v) 


Theorem  FTMR 
Definition  EELT 
Theorem  VRLT 
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which  by  Definition  EEM  says  that  ps  (v)  is  an  eigenvector  of  the  matrix  Mg  n for 
the  eigenvalue  A. 

(^=)  Assume  that  ps  (v)  is  an  eigenvector  of  MBB  for  the  eigenvalue  A.  Then 
T (v)  = p^1  ( pB  ( T (v)))  Definition  IVLT 

= pg1  ( Mg  bpb  (v))  Theorem  FTMR 

= pg1  ( XpB  (v))  Definition  EEM 

= Xpg1  ( pb  (v))  Theorem  ILTLT 

= Av  Definition  IVLT 

which  by  Definition  EELT  says  v is  an  eigenvector  of  T for  the  eigenvalue  A.  ■ 


Subsection  CELT 

Computing  Eigenvectors  of  Linear  Transformations 

Theorem  EER  tells  us  that  the  eigenvalues  of  a linear  transformation  are  the 
eigenvalues  of  any  representation,  no  matter  what  the  choice  of  the  basis  B might  be. 
So  we  could  now  unambiguously  define  items  such  as  the  characteristic  polynomial 
of  a linear  transformation,  which  we  would  define  as  the  characteristic  polynomial 
of  any  matrix  representation.  We  will  say  that  again  — eigenvalues,  eigenvectors, 
and  characteristic  polynomials  are  intrinsic  properties  of  a linear  transformation, 
independent  of  the  choice  of  a basis  used  to  construct  a matrix  representation. 

As  a practical  matter,  how  does  one  compute  the  eigenvalues  and  eigenvectors 
of  a linear  transformation  of  the  form  T : V — > VI  Choose  a nice  basis  B for  V, 
one  where  the  vector  representations  of  the  values  of  the  linear  transformations 
necessary  for  the  matrix  representation  are  easy  to  compute.  Construct  the  matrix 
representation  relative  to  this  basis,  and  find  the  eigenvalues  and  eigenvectors  of 
this  matrix  using  the  techniques  of  Chapter  E.  The  resulting  eigenvalues  of  the 
matrix  are  precisely  the  eigenvalues  of  the  linear  transformation.  The  eigenvectors 
of  the  matrix  are  column  vectors  that  need  to  be  converted  to  vectors  in  V through 
application  of  (this  is  part  of  the  content  of  Theorem  EER). 

Now  consider  the  case  where  the  matrix  representation  of  a linear  transformation 
is  diagonalizable.  The  n linearly  independent  eigenvectors  that  must  exist  for  the 
matrix  (Theorem  DC)  can  be  converted  (via  p g1)  into  eigenvectors  of  the  linear 
transformation.  A matrix  representation  of  the  linear  transformation  relative  to  a 
basis  of  eigenvectors  will  be  a diagonal  matrix  — an  especially  nice  representation! 
Though  we  did  not  know  it  at  the  time,  the  diagonalizations  of  Section  SD  were  really 
about  finding  especially  pleasing  matrix  representations  of  linear  transformations. 

Here  are  some  examples. 

Example  ELTT  Eigenvectors  of  a linear  transformation,  twice 
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Consider  the  linear  transformation  S : M22 
S 


M22  defined  by 


—6  — c — 3d  —14a  — 156  — 13c  + d 
18a  + 216  + 19c  + 3 d —6a  — 76  — 7c  — 3d 


To  find  the  eigenvalues  and  eigenvectors  of  S we  will  build  a matrix  representation 
and  analyze  the  matrix.  Since  Theorem  EER  places  no  restriction  on  the  choice  of 
the  basis  B1  we  may  as  well  use  a basis  that  is  easy  to  work  with.  So  set 


B = {xi,  x2,  x3,  x4}  = 


1 0 
0 0 


0 0 
1 0 


0 0 
0 1 


Then  to  build  the  matrix  representation  of  S relative  to  B compute, 
Pb  ( S (x4))  = pB 


0 -14 

18  -6 


= pB  (0x4  + (— 14)x2  + 18x3  + (— 6)x4)  = 


' 0 ' 
-14 
18 
-6 


pB  ( S (x2))  = pB 


-1  -15 
21  -7 


= pb  ((-l)xi  + (— 15)x2  + 21x3  + (-7)x4)  = 


pB  ( S (x3))  = pB 


-1  -13 
19  -7 


= pb  ((-l)xi  + (— 13)x2  + 19x3  + (-7)x4)  = 


■ -r 
-15 
21 
-7 


■ -i‘ 

-13 

19 

-7 


ps  ( S (x4))  = pB 


-3  1 

3 -3 


= Pb  ((— 3)xi  + lx2  + 3x3  + (-3)x4)  = 
So  by  Definition  MR  we  have 


—3' 

1 

3 

-3 


M — Mb^b  — 


0 

-1 

-1 

-3' 

-14 

-15 

-13 

1 

18 

21 

19 

3 

-6 

-7 

-7 

-3 
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Now  compute  eigenvalues  and  eigenvectors  of  the  matrix  representation  of  M 
with  the  techniques  of  Section  EE.  First  the  characteristic  polynomial, 

Pm  {%)  = det  (M  — xl 4)  = xA  — x3  — 10x2  + 4x  + 24  = (x  — 3)(x  — 2){x  + 2)2 

We  could  now  make  statements  about  the  eigenvalues  of  M,  but  in  light  of 
Theorem  EER  we  can  refer  to  the  eigenvalues  of  S and  mildly  abuse  (or  extend)  our 
notation  for  multiplicities  to  write 


as  (3)  = 1 as  (2)  = 1 

Now  compute  the  eigenvectors  of  M, 


A = 3 M - 3/4  = 


as  (-2)  =2 


r ~3 

-1 

-1 

-31 

rs 

0 

0 

1 ■ 

-14 

-18 

-13 

1 

RREF 

0 

0 

0 

-3 

18 

21 

16 

3 

0 

0 

0 

3 

L -6 

-7 

-7 

— ej 

. 0 

0 

0 

0 . 

£M  (3)  = Af(M  - 3/4)  = 


-r 

3 

-3 

1 


A = 2 M - 2/4  = 


r-2 

-1 

-1 

-31 

[0 

0 

0 

2 ' 

-14 

-17 

-13 

1 

RREF 

0 

0 

0 

-4 

18 

21 

17 

3 

0 

0 

0 

3 

L -6 

-7 

-7 

— sj 

. 0 

0 

0 

0 . 

£m  (2)  = Af(M  - 2J4)  = 


—2' 

4 

-3 

1 


A = — 2 M — (—2)14  = 


r 2 

-1 

-1 

-31 

[0 

0 

0 

-1" 

-14 

-13 

-13 

1 

RREF 

0 

0 

1 

1 

18 

21 

21 

3 

0 

0 

0 

0 

L -6 

-7 

-7 

-lj 

0 

0 

0 

0 

£m  (—2)  = Af(M  — (— 2)74)  = 


r 

■ 0 ■ 

■ 1 ■ 

1 

1 

-1 

-1 

1 

1 

0 

i 

0 

1 

1 

According  to  Theorem  EER  the  eigenvectors  just  listed  as  basis  vectors  for  the 
eigenspaces  of  M are  vector  representations  (relative  to  B)  of  eigenvectors  for  S.  So 
the  application  if  the  inverse  function  p jj1  will  convert  these  column  vectors  into 
elements  of  the  vector  space  M2 2 (2x2  matrices)  that  are  eigenvectors  of  S.  Since 
Pb  is  an  isomorphism  (Theorem  VRILT),  so  is  Pg1.  Applying  the  inverse  function 
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will  then  preserve  linear  independence  and  spanning  properties,  so  with  a sweeping 
application  of  the  Coordinatization  Principle  and  some  extensions  of  our  previous 
notation  for  eigenspaces  and  geometric  multiplicities,  we  can  write, 


Pb 1 


Pb 1 


Pb 1 


Pb 1 


= (-l)xi  + 3x2  + (— 3)x3  + lx4  = 


= (— 2)xi  + 4x2  + (— 3)x3  + lx4  = 


= Oxi  + ( l)x2  + lx3  + 0x4 


0 -1 

1 0 


= lxi  + (— l)x2  + 0x3  + lx4 


1 -1 

0 1 


So 


£s  (3) 
£s  (2) 


£s  (-2) 


-1  3 
-3  1 

-2  4 
-3  1 

0 -1 
1 0 


with  geometric  multiplicities  given  by 


7s  (3)  = 1 7s  (2)  = 1 


7s  (-2) 


2 


Suppose  we  now  decided  to  build  another  matrix  representation  of  S,  only  now 
relative  to  a linearly  independent  set  of  eigenvectors  of  S,  such  as 


f 

-1  3' 

—2  4 

0 -l' 

1 

-3  1 

7 

-3  1 

7 

1 0 

1 

0 


-1 

1 


At  this  point  you  should  have  computed  enough  matrix  representations  to  predict 
that  the  result  of  representing  S relative  to  C will  be  a diagonal  matrix.  Com- 
puting this  representation  is  an  example  of  how  Theorem  SCB  generalizes  the 
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diagonalizations  from  Section  SD.  For  the  record,  here  is  the  diagonal  representation, 


MS,c  = 


'3  0 0 

0 2 0 

0 0-2 

0 0 0 


0 ‘ 

0 

0 

-2 


Our  interest  in  this  example  is  not  necessarily  building  nice  representations,  but 
instead  we  want  to  demonstrate  how  eigenvalues  and  eigenvectors  are  an  intrinsic 
property  of  a linear  transformation,  independent  of  any  particular  representation. 
To  this  end,  we  will  repeat  the  foregoing,  but  replace  B by  another  basis.  We  will 
make  this  basis  different,  but  not  extremely  so, 

D = {yi,  y2,  y3,  y4}  = 

Then  to  build  the  matrix  representation  of  S relative  to  D compute, 


1 0 

1 1 

1 1 

1 1 

0 0 

1 

0 0 

J 

1 0 

•) 

1 1 

Pd  ( S (yi))  = pD 


0 -14 

18  -6 


= Pd  (14yi  + (-32)y2  + 24y3  + (-6)y4)  = 


Pd  ( S (y2))  = pD 


-1  -29 
39  -13 


= Pd  (28yi  + (— 68)y2  + 52y3  + (-13)y4)  = 


Pd  ( S (y3))  = pD 


-2  -42 

58  -20 


= Pd  (40yi  + (-100)y2  + 78y3  + (-20)y4) 


pD  ( S (y4))  = pD 


-5  -41 
61  -23 


= Pd  (36yi  + (— 102)y2  + 84y3  + (-23)y4) 


‘ 14  ‘ 

-32 

24 

-6 


‘ 28  ‘ 

-68 

52 

-13 


‘ 40  ‘ 
-100 
78 
-20 


‘ 36  ‘ 
-102 
84 
-23 
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So  by  Definition  MR  we  have 


‘ 14 

28 

40 

36 

-32 

-68 

-100 

-102 

24 

52 

78 

84 

-6 

-13 

-20 

-23 

N — Mp  D — 


Now  compute  eigenvalues  and  eigenvectors  of  the  matrix  representation  of  N 
with  the  techniques  of  Section  EE.  First  the  characteristic  polynomial, 

Pn  (x)  = det  (N  — xli)  = x4  — x3  — IO.t2  + 4x  + 24  = (x  — 3)(x  — 2)(x  + 2)2 


Of  course  this  is  not  news.  We  now  know  that  M = MB  B and  N = D 


are 


similar  matrices  (Theorem  SCB).  But  Theorem  SMEE  told  us  long  ago  that  similar 
matrices  have  identical  characteristic  polynomials.  Now  compute  eigenvectors  for 
the  matrix  representation,  which  will  be  different  than  what  we  found  for  M, 


A = 3 N - 3/4  = 


r 11 

28 

40 

36  1 

ri 

0 

0 

4 1 

-32 

-71 

-100 

-102 

RREF 

0 

1 

0 

-6 

24 

52 

75 

84 

— ■ — y 

0 

0 

1 

4 

-6 

-13 

-20 

-26 

0 

0 

0 

0 

SN  (3)  = A f(N  - 3/4)  = 


-4' 

6 

-4 

1 


A = 2 N — 2/4  = 


r 12 

28 

40 

36  1 

ri 

0 

0 

6 1 

-32 

-70 

-100 

-102 

RREF 

0 

1 

0 

-7 

24 

52 

76 

84 

— ■ — y 

0 

0 

1 

4 

-6 

-13 

-20 

-25 

0 

0 

0 

0 

£N  (2)  = A r(N  - 2/4)  = 


—6' 

7 

-4 

1 


A = -2  N — (—2  )/4  = 


r 16 

28 

40 

36  1 

IT 

0 

-1 

-31 

-32 

-66 

-100 

-102 

RREF 

0 

1 

2 

3 

24 

52 

80 

84 

y 

0 

0 

0 

0 

-6 

-13 

-20 

-21 

0 

0 

0 

0 

r 

■ 1 ■ 

■ 3 ■ 

1 

-2 

-3 

1 

? 

0 

i 

0 

1 

£N  (—2)  = J\f(N  — (— 2)74)  = 


Employing  Theorem  EER  we  can  apply  to  each  of  the  basis  vectors  of  the 
eigenspaces  of  N to  obtain  eigenvectors  for  S that  also  form  bases  for  eigenspaces  of 
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S, 


= (— 4)yi  + 6y2  + (-4)y3  + ly4  = 


-1  3 
-3  1 


= (-6)yi  + 7y2  + (-4)y3  + ly4  = 


-2  4 

-3  1 


= lyi  + (-2)y  2 + ly3  + 0y4  = 


0 -1 

1 0 


= 3yi  + (— 3)y2  + 0y3  + ly4  = 


1 -2 

1 1 


The  eigenspaces  for  the  eigenvalues  of  algebraic  multiplicity  1 are  exactly  as 
before, 


£s{  3) 
£s(  2) 


-1 

-3 

-2 

-3 


3 
1 

4 

1 


However,  the  eigenspace  for  A = —2  would  at  first  glance  appear  to  be  different. 
Here  are  the  two  eigenspaces  for  A = —2,  first  the  eigenspace  obtained  from  M = 
Mg  B,  then  followed  by  the  eigenspace  obtained  from  M = Mg  D. 


£s  (-2) 


£s  (-2) 


'0 

-1 

1 

0 

5 

-2 

1 


Subspaces  generally  have  many  bases,  and  that  is  the  situation  here.  With  a 
careful  proof  of  set  equality,  you  can  show  that  these  two  eigenspaces  are  equal  sets. 
The  key  observation  to  make  such  a proof  go  is  that 


1 -2 

1 1 


0 -1 

1 0 


1 -1 

0 1 


which  will  establish  that  the  second  set  is  a subset  of  the  first.  With  equal  dimensions, 
Theorem  EDYES  will  finish  the  task. 

So  the  eigenvalues  of  a linear  transformation  are  independent  of  the  matrix 
representation  employed  to  compute  them!  A 
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Another  example,  this  time  a bit  larger  and  with  complex  eigenvalues. 

Example  CELT  Complex  eigenvectors  of  a linear  transformation 
Consider  the  linear  transformation  Q : P4  — »•  P4  defined  by 

Q (a  + bx  + cx 2 + dx3  + ex4) 

= (-46a  - 22 b + 13 c + 5 d + e)  + (117 a + 57 b - 32c  - 15d  - 4e)x+ 
(—69  a — 295  + 21c  — 7e)x2  + (159a  + 735  — 44c  — 13d  + 2e)x3+ 
(-195a  - 875  + 55c  + lOd  - 13e)x4 
Choose  a simple  basis  to  compute  with,  say 


B={  1, 


X,  X , X , X 


} 


Then  it  should  be  apparent  that  the  matrix  representation  of  Q relative  to  B is 


M = MBB  = 


Compute  the  characteristic  polynomial,  eigenvalues  and  eigenvectors  according 
to  the  techniques  of  Section  EE, 

Pq  (x)  = —x5  + 6a?4  — x 3 — 88a;2  + 252x  — 208 

= — (x  — 2)2(x  + 4)  (x2  — 6x  + 13) 
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55 

10 

-13 

A = 2 


M - (2)  J5  = 


= 

-(x- 

2)2(x 

+ 4)  (x 

-(3- 

f 2*))  (x  - 

-(3- 

2*)) 

aQ  (-4) 

= 1 

OtQ 

(3  + 

2 i)  = 1 

(3  — 

2 i)  = 

= 1 

r -48 
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13 

5 

1 1 

IT 

0 
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1 
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_5 
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19 
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10 
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0 

0 

0 

0 

0 

gM  (2)  = A f{M  - (2)  J5)  = 


_!■ 

52 

2 

2 

1 

0 


rl"I 


— 1" 
5 
4 
2 
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A = — 4 
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M - (-4)/5  = 


Sm  (-4)  = A f(M  - (-4)/5)  = 


— r 

3 

1 

2 
1 


A — 3 + 2* 


M - (3  + 2i)4  = 
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5 

1 

117 

54-2  i 
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£m  (3  + 2 i)  = N{M  - (3  + 2t)4)  = 
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\ 
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I 
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\ / 
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£m  (3  - 2 i)  = Af{M  - (3  - 2 i)Ib)  = ( < 


3 | i 
4'  4. 

_ 7 i_ 

i4  4 

1 + * 

2 ' 2 

7 i_ 

4 4 

1 


= 


3 + * 
—7  — i 
2 — 2 i 
—7  — i 
4 


It  is  straightforward  to  convert  each  of  these  basis  vectors  for  eigenspaces  of  M 
back  to  elements  of  P4  by  applying  the  isomorphism  p] , 


( 


Pb 


Pb 1 


Pb 1 


-1' 

5 

4 
2 
0 

r 

5 

12 

0 

2 

-r 

3 

1 

2 
1 


\ 


= — 1 + 5x  + 4x2  + 2x3 


= 1 + 5x  + 12x2  + 2x4 


= — 1 + 3a;  + xz  + 2x3  + x4 


Pb 


Pb 


3 -i' 
—7  + i 
2-2  i 
- 7 + i 

VL  4 \) 

3 + 2 
—7  — i 
2 + 2z 
-7-2 

VL  4 j/ 


\ 


= (3  — *)  + (—7  + *)a;  + (2  — 2i)x2  + (—7  + i)x3  + 4cc4 


= (3  + *)  + (—7  — i)x  + (2  + 2 i)x2  + (—7  — i)x3  + 4x4 


So  we  apply  Theorem  EER  and  the  Coordinatization  Principle  to  get  the 
eigenspaces  for  Q, 

£q  (2)  = ({-l  + 5x  + 4x2  + 2x3,  1 + 5a;  + 12a:2  + 2x4}) 

£q  (-4)  = ({-1  + 3a;  + a;2  + 2a;3  + a;4}) 

£q  (3  + 2 i)  = ({(3  - i)  + (-7  + i) x + (2  - 2i)x2  + (-7  + i)x3  + 4a;4}) 

£q  (3  - 2 i)  = ({(3  + i)  + (-7  - i)x  + (2  + 2i)a;2  + (-7  - i)x3  + 4x4}) 
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with  geometric  multiplicities 


7Q  (2)  = 2 1Q  (-4)  = 1 


7 Q (3  + 2z)  — 1 


7 Q (3  - 2z)  = 1 

A 


Reading  Questions 


1.  The  change-of- basis  matrix  is  a matrix  representation  of  which  linear  transformation? 

2.  Find  the  change-of-basis  matrix,  Cb,c,  for  the  two  bases  of  C2 


B = 


-1 

2 


C = 


3.  What  is  the  third  “surprise,”  and  why  is  it  surprising? 


Exercises 

C20  In  Example  CBCV  we  computed  the  vector  representation  of  y relative  to  C,  pc  (y), 
as  an  example  of  Theorem  CB.  Compute  this  same  representation  directly.  In  other  words, 
apply  Definition  VR  rather  than  Theorem  CB. 

C21'  Perform  a check  on  Example  MRCM  by  computing  Mg  D directly.  In  other  words, 
apply  Definition  MR  rather  than  Theorem  MRCB. 

C30'  Find  a basis  for  the  vector  space  P3  composed  of  eigenvectors  of  the  linear  transfor- 
mation T.  Then  find  a matrix  representation  of  T relative  to  this  basis. 

T : -P3  — > P3,  T (a  + bx  + cx2  + dx 3)  = (a+c+d)  + (6+c+d)a;+(a+6+c)3;2  + (a+6+d):E3 

C40'  Let  S22  be  the  vector  space  of  2 x 2 symmetric  matrices.  Find  a basis  C for  S22 
that  yields  a diagonal  matrix  representation  of  the  linear  transformation  R. 

—5a  + 2b  — 3c  -12a + 56 -6c' 

— 12a  + 56  — 6c  6a  — 26  + 4c 

C41'  Let  S22  be  the  vector  space  of  2 x 2 symmetric  matrices.  Find  a basis  for  S22 
composed  of  eigenvectors  of  the  linear  transformation  Q \ S22  — > S22- 


( 

a b 

'25a + 186  + 30c  -16a  - 116  - 20c 

b c 

)- 

-16a -116 -20c  -11a -96 -12c 

TICL  Suppose  that  T:  V — > V is  an  invertible  linear  transformation  with  a nonzero 

eigenvalue  A.  Prove  that  — is  an  eigenvalue  of  T ■* 

A 

T15  Suppose  that  I?  is  a vector  space  and  T : V — ¥ V is  a linear  transformation.  Prove 
that  T is  injective  if  and  only  if  A = 0 is  not  an  eigenvalue  of  T. 


R '■  S22  — > S22,  R 


a 6 
6 c 


Section  OD 

Orthonormal  Diagonalization 


We  have  seen  in  Section  SD  that  under  the  right  conditions  a square  matrix  is 
similar  to  a diagonal  matrix.  We  recognize  now,  via  Theorem  SCB,  that  a similarity 
transformation  is  a change  of  basis  on  a matrix  representation.  So  we  can  now  discuss 
the  choice  of  a basis  used  to  build  a matrix  representation,  and  decide  if  some  bases 
are  better  than  others  for  this  purpose.  This  will  be  the  tone  of  this  section.  We  will 
also  see  that  every  matrix  has  a reasonably  useful  matrix  representation,  and  we  will 
discover  a new  class  of  diagonalizable  linear  transformations.  First  we  need  some 
basic  facts  about  triangular  matrices. 

Subsection  TM 
Triangular  Matrices 

An  upper,  or  lower,  triangular  matrix  is  exactly  what  it  sounds  like  it  should  be, 
but  here  are  the  two  relevant  definitions. 

Definition  UTM  Upper  Triangular  Matrix 

The  n x n square  matrix  A is  upper  triangular  if  [A]i  • = 0 whenever  i > j.  □ 
Definition  LTM  Lower  Triangular  Matrix 

The  n x n square  matrix  A is  lower  triangular  if  [A]  - = 0 whenever  i < j.  □ 

Obviously,  properties  of  a lower  triangular  matrices  will  have  analogues  for  upper 
triangular  matrices.  Rather  than  stating  two  very  similar  theorems,  we  will  say  that 
matrices  are  “triangular  of  the  same  type”  as  a convenient  shorthand  to  cover  both 
possibilities  and  then  give  a proof  for  just  one  type. 

Theorem  PTMT  Product  of  Triangular  Matrices  is  Triangular 

Suppose  that  A and  B are  square  matrices  of  size  n that  are  triangular  of  the  same 

type.  Then  AB  is  also  triangular  of  that  type. 

Proof.  We  prove  this  for  lower  triangular  matrices  and  leave  the  proof  for  upper 
triangular  matrices  to  you.  Suppose  that  A and  B are  both  lower  triangular.  We 
need  only  establish  that  certain  entries  of  the  product  AB  are  zero.  Suppose  that 
i < j,  then 


n 


Theorem  EMP 


k— 1 
i- 1 


n 


Property  AACN 
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j — 1 n 

= ifc  0 + W ifc  lB]  kj  k < 3:  Definition  LTM 

k—1  k—j 

j- 1 n 

= £hl°+£  0 [B\k-  i < j < k,  Definition  LTM 

k—1  k—j 

j~  1 n 

= £»  + £» 

k—1  k=j 

= o 


Since  [AB]^  = 0 whenever  i < j,  by  Definition  LTM,  AB  is  lower  triangular.  ■ 


The  inverse  of  a triangular  matrix  is  triangular,  of  the  same  type. 


Theorem  ITMT  Inverse  of  a Triangular  Matrix  is  Triangular 
Suppose  that  A is  a nonsingular  matrix  of  size  n that  is  triangular.  Then  the  inverse 
of  A,  A~x,  is  triangular  of  the  same  type.  Furthermore,  the  diagonal  entries  of 
A~x  are  the  reciprocals  of  the  corresponding  diagonal  entries  of  A.  More  precisely, 


[A] 


-l 


Proof.  We  give  the  proof  for  the  case  when  A is  lower  triangular,  and  leave  the  case 
when  A is  upper  triangular  for  you.  Consider  the  process  for  computing  the  inverse  of 
a matrix  that  is  outlined  in  the  proof  of  Theorem  CINM.  We  augment  A with  the  size 
n identity  matrix,  and  row-reduce  the  nx2n  matrix  to  reduced  row-echelon  form 
via  the  algorithm  in  Theorem  REMEF.  The  proof  involves  tracking  the  peculiarities 
of  this  process  in  the  case  of  a lower  triangular  matrix.  Let  M = [A  | /„]. 

First,  none  of  the  diagonal  elements  of  A are  zero.  By  repeated  expansion  about 
the  first  row,  the  determinant  of  a lower  triangular  matrix  can  be  seen  to  be  the 
product  of  the  diagonal  entries  (Theorem  DER).  If  just  one  of  these  diagonal  elements 
was  zero,  then  the  determinant  of  A is  zero  and  A is  singular  by  Theorem  SMZD. 
Slightly  violating  the  exact  algorithm  for  row  reduction  we  can  form  a matrix,  M' , 
that  is  row-equivalent  to  M , by  multiplying  row  i by  the  nonzero  scalar  [A]ii  , for 
1 < i < n.  This  sets  [M']u  = 1 and  [M'}i  n+1  = [AJb1,  and  leaves  every  zero  entry 
of  M unchanged. 

Let  Mj  denote  the  matrix  obtained  form  M'  after  converting  column  j to  a pivot 
column.  We  can  convert  column  j of  Mj- \ into  a pivot  column  with  a set  of  n — j — 1 
row  operations  of  the  form  aRj  + Rk  with  j + 1 < k < n.  The  key  observation  here 
is  that  we  add  multiples  of  row  j only  to  higher-numbered  rows.  This  means  that 
none  of  the  entries  in  rows  1 through  j — 1 is  changed,  and  since  row  j has  zeros 
in  columns  j + 1 through  n , none  of  the  entries  in  rows  j + 1 through  n is  changed 
in  columns  j + 1 through  n.  The  first  n columns  of  M'  form  a lower  triangular 
matrix  with  l’s  on  the  diagonal.  In  its  conversion  to  the  identity  matrix  through 
this  sequence  of  row  operations,  it  remains  lower  triangular  with  l’s  on  the  diagonal. 
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What  happens  in  columns  n + 1 through  2 n of  M'l  These  columns  began  in  M 
as  the  identity  matrix,  and  in  M'  each  diagonal  entry  was  scaled  to  a reciprocal  of 
the  corresponding  diagonal  entry  of  A.  Notice  that  trivially,  these  final  n columns 
of  M'  form  a lower  triangular  matrix.  Just  as  we  argued  for  the  first  n columns, 
the  row  operations  that  convert  Mj_  i into  Mj  will  preserve  the  lower  triangular 
form  in  the  final  n columns  and  preserve  the  exact  values  of  the  diagonal  entries.  By 
Theorem  CINM,  the  final  n columns  of  Mn  is  the  inverse  of  A,  and  this  matrix  has 
the  necessary  properties  advertised  in  the  conclusion  of  this  theorem.  ■ 


Not  every  matrix  is  cliagonalizable,  but  every  linear  transformation  has  a matrix 
representation  that  is  an  upper  triangular  matrix,  and  the  basis  that  achieves  this 
representation  is  especially  pleasing.  Here  is  the  theorem. 

Theorem  UTMR  Upper  Triangular  Matrix  Representation 
Suppose  that  T : V — > V is  a linear  transformation.  Then  there  is  a basis  B for  V 
such  that  the  matrix  representation  ofT  relative  to  B,  MB  B,  is  an  upper  triangular 
matrix.  Each  diagonal  entry  is  an  eigenvalue  of  T,  and  if  A is  an  eigenvalue  of  T, 
then  A occurs  olt  (A)  times  on  the  diagonal. 

Proof.  We  begin  with  a proof  by  induction  (Proof  Technique  I)  of  the  first  statement 
in  the  conclusion  of  the  theorem.  We  use  induction  on  the  dimension  of  V to  show 
that  if  T : V — > V is  a linear  transformation,  then  there  is  a basis  B for  V such  that 
the  matrix  representation  of  T relative  to  B , MB  B,  is  an  upper  triangular  matrix. 

To  start  suppose  that  dim  (V)  = 1.  Choose  any  nonzero  vector  v £ V and  realize 
that  V = ({v}).  Then  T (v)  = /3v  for  some  /?  £ C,  which  determines  T uniquely 
(Theorem  LTDB).  This  description  of  T also  gives  us  a matrix  representation  relative 
to  the  basis  B = {v}  as  the  lxl  matrix  with  lone  entry  equal  to  j3.  And  this  matrix 
representation  is  upper  triangular  (Definition  UTM). 

For  the  induction  step  let  dim  (V)  = to,  and  assume  the  theorem  is  true  for  every 
linear  transformation  defined  on  a vector  space  of  dimension  less  than  to.  By  Theorem 
EMHE  (suitably  converted  to  the  setting  of  a linear  transformation),  T has  at  least 
one  eigenvalue,  and  we  denote  this  eigenvalue  as  A.  (We  will  remark  later  about 
how  critical  this  step  is.)  We  now  consider  properties  of  the  linear  transformation 
T - XI v : V ->•  V. 

Let  x be  an  eigenvector  of  T for  A.  By  definition  x ^ 0.  Then 


Subsection  UTMR 

Upper  Triangular  Matrix  Representation 


{T  — XIV)  (x)  = T (x)  — XIV  (x) 
= T (x)  — Ax 


Theorem  VSLT 
Definition  IDLT 
Definition  EELT 
Property  AI 


= Ax  — Ax 

= 0 
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So  T — A ly  is  not  injective,  as  it  has  a nontrivial  kernel  (Theorem  KILT).  With 
an  application  of  Theorem  RPNDD  we  bound  the  rank  of  T — A ly, 

r(T—  \Iy)  = dim  (V)  — n (T  — A ly)  <m  — 1 

Let  W be  the  subspace  of  V that  is  the  range  of  T — A/y,  W = 1Z(T  — A ly), 
and  define  k = dim  ( W ) < m — 1.  We  define  a new  linear  transformation  S,  on  W, 

S : W — > W,  S (w)  = T (w) 


This  does  not  look  we  have  accomplished  much,  since  the  action  of  S is  identical 
to  the  action  of  T.  For  our  purposes  this  will  be  a good  thing.  What  is  different  is 
the  domain  and  codomain.  S is  defined  on  W,  a vector  space  with  dimension  less 
than  m,  and  so  is  susceptible  to  our  induction  hypothesis.  Verifying  that  S is  really  a 
linear  transformation  is  almost  entirely  routine,  with  one  exception.  Employing  T in 
our  definition  of  S raises  the  possibility  that  the  outputs  of  S will  not  be  contained 
within  W (but  instead  will  lie  inside  V , but  outside  W).  To  examine  this  possibility, 
suppose  that  w £ W. 


S (w)  = T (w) 

= T (w)  + 0 

= T (w)  + (A  ly  (w)  - A ly  (w)) 
= (T  (w)  — A ly  (w))  + A ly  (w) 
= (T  (w)  — A ly  (w))  + Aw 
= (T  — A ly)  (w)  + Aw 


Property  Z 
Property  AI 
Property  AA 
Definition  IDLT 
Theorem  VSLT 


Since  W is  the  range  of  T — A/y,  (T  — A/y)  (w)  £ W.  And  by  Property  SC, 
Aw  £ W.  Finally,  applying  Property  AC  we  see  by  closure  that  the  sum  is  in  W and 
so  we  conclude  that  S (w)  £ W.  This  argument  convinces  us  that  it  is  legitimate  to 
define  S as  we  did  with  W as  the  codomain. 

S'  is  a linear  transformation  defined  on  a vector  space  with  dimension  k,  less 
than  m,  so  we  can  apply  the  induction  hypothesis  and  conclude  that  W has  a basis, 
C = {wj,  W2,  W3,  . . . , Wfc},  such  that  the  matrix  representation  of  S relative  to  C 
is  an  upper  triangular  matrix. 

Beginning  with  the  linearly  independent  set  C,  repeatedly  apply  Theorem  ELIS 
to  add  vectors  to  C,  maintaining  a linearly  independent  set  and  spanning  ever  larger 
subspaces  of  V . This  process  will  end  with  the  addition  oim—k  vectors,  which  together 
with  C will  span  all  of  V . Denote  these  vectors  as  D = {ui,  112,  113,  . . . , u m-fc}. 
Then  B = C U D is  a basis  for  V,  and  is  the  basis  we  desire  for  the  conclusion  of 
the  theorem.  So  we  now  consider  the  matrix  representation  of  T relative  to  B. 

Since  the  definition  of  T and  S agree  on  W,  the  first  k columns  of  Mg  B will  have 
the  upper  triangular  matrix  representation  of  S in  the  first  k rows.  The  remaining 
m — k rows  of  these  first  k columns  will  be  all  zeros  since  the  outputs  of  T for  basis 
vectors  from  C are  all  contained  in  W and  hence  are  linear  combinations  of  the  basis 
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vectors  in  C.  The  situation  for  T on  the  basis  vectors  in  D is  not  quite  as  pretty, 
but  it  is  close. 

For  1 < i < m — fc,  consider 


Pb  ( T (Uj)) 


Pb  ( T (uj)  + 0) 

Pb  ( T (ui)  + (A Iv  (u i)  - A Iv  (u4))) 
pB  ((T  (u i)  - A Iv  (u i))+  A Iv  (uj)) 

Pb  ((T  (u i)  - A Iv  (u i))  + Auj) 

Pb  {(T  - \IV)  (uj)  + Auj) 

pB  (aiwi  + a2w2  + a3w3  H b akwk  + Au,;) 


Property  Z 
Property  AI 
Property  AA 
Definition  IDLT 
Theorem  VSLT 
Definition  RLT 


ai 

a2 


ak 

0 


0 

A 

0 


Definition  VR 


0 


In  the  penultimate  equality,  we  have  rewritten  an  element  of  the  range  of  T — A ly 
as  a linear  combination  of  the  basis  vectors,  C,  for  the  range  of  T — A/y , W,  using 
the  scalars  a±,  a2,  a3,  . . . , ak.  If  we  incorporate  these  m — k column  vectors  into 
the  matrix  representation  Mg  B we  find  m — k occurrences  of  A on  the  diagonal, 
and  any  nonzero  entries  lying  only  in  the  first  k rows.  Together  with  the  k x k 
upper  triangular  representation  in  the  upper  left-hand  corner,  the  entire  matrix 
representation  for  T is  clearly  upper  triangular.  This  completes  the  induction  step. 
So  for  any  linear  transformation  there  is  a basis  that  creates  an  upper  triangular 
matrix  representation. 

We  have  one  more  statement  in  the  conclusion  of  the  theorem  to  verify.  The 
eigenvalues  of  T,  and  their  multiplicities,  can  be  computed  with  the  techniques  of 
Chapter  E relative  to  any  matrix  representation  (Theorem  EER).  We  take  this 
approach  with  our  upper  triangular  matrix  representation  MB  B . Let  di  be  the 
diagonal  entry  of  MB  B in  row  i and  column  i.  Then  the  characteristic  polynomial, 
computed  as  a determinant  (Definition  CP)  with  repeated  expansions  about  the 
first  column,  is 


Pmt  b (x)  = (di  - x)  (d2  -x)(d3-x)---  ( dm  - x) 
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The  roots  of  the  polynomial  equation  pM t ( x ) = 0 are  the  eigenvalues  of  the 
linear  transformation  (Theorem  EMRCP).  So  each  diagonal  entry  is  an  eigenvalue, 
and  is  repeated  on  the  diagonal  exactly  ar  (A)  times  (Definition  AME).  ■ 

A key  step  in  this  proof  was  the  construction  of  the  subspace  W with  dimension 
strictly  less  than  that  of  V.  This  required  an  eigenvalue/eigenvector  pair,  which  was 
guaranteed  to  us  by  Theorem  EMHE.  Digging  deeper,  the  proof  of  Theorem  EMHE 
requires  that  we  can  factor  polynomials  completely,  into  linear  factors.  This  will  not 
always  happen  if  our  set  of  scalars  is  the  reals,  R.  So  this  is  our  final  explanation  of 
our  choice  of  the  complex  numbers,  C,  as  our  set  of  scalars.  In  C polynomials  factor 
completely,  so  every  matrix  has  at  least  one  eigenvalue,  and  an  inductive  argument 
will  get  us  to  upper  triangular  matrix  representations. 

In  the  case  of  linear  transformations  defined  on  Cm,  we  can  use  the  inner  product 
(Definition  IP)  profitably  to  fine-tune  the  basis  that  yields  an  upper  triangular  matrix 
representation.  Recall  that  the  adjoint  of  matrix  A (Definition  A)  is  written  as  A*. 

Theorem  OBUTR  Orthonormal  Basis  for  Upper  Triangular  Representation 
Suppose  that  A is  a square  matrix.  Then  there  is  a unitary  matrix  U , and  an  upper 
triangular  matrix  T,  such  that 

U*AU  = T 

and  T has  the  eigenvalues  of  A as  the  entries  of  the  diagonal. 

Proof.  This  theorem  is  a statement  about  matrices  and  similarity.  We  can  convert 
it  to  a statement  about  linear  transformations,  matrix  representations  and  bases 
(Theorem  SCB).  Suppose  that  A is  an  n x n matrix,  and  define  the  linear  trans- 
formation S : Cn  — > Cn  by  S (x)  = Ax.  Then  Theorem  UTMR  gives  us  a basis 
B = {vi,  v2,  v3,  . . . , vn}  for  C"  such  that  a matrix  representation  of  S relative  to 
B1  MJ  g,  is  upper  triangular. 

Now  convert  the  basis  B into  an  orthogonal  basis,  C,  by  an  application  of  the 
Gram-Schmidt  procedure  (Theorem  GSP).  This  is  a messy  business  computationally, 
but  here  we  have  an  excellent  illustration  of  the  power  of  the  Gram-Schmidt  procedure. 
We  need  only  be  sure  that  B is  linearly  independent  and  spans  Cn,  and  then  we  know 
that  C is  linearly  independent,  spans  Cn  and  is  also  an  orthogonal  set.  We  will  now 
consider  the  matrix  representation  of  S relative  to  C (rather  than  B).  Write  the  new 
basis  as  C = {yq,  y2,  y3,  . . . , yn}.  The  application  of  the  Gram-Schmidt  procedure 
creates  each  vector  of  C,  say  y; , as  the  difference  of  Vj  and  a linear  combination 
of  yi,  y2,  y3,  . . . , y^-i-  We  are  not  concerned  here  with  the  actual  values  of  the 
scalars  in  this  linear  combination,  so  we  will  write 

i- 1 

yj  = btkyk 

k= 1 
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where  the  bjk  are  shorthand  for  the  scalars.  The  equation  above  is  in  a form  useful 
for  creating  the  basis  C from  B.  To  better  understand  the  relationship  between  B 
and  C convert  it  to  read 

j- 1 

vj  = y.j  • bJkyk 

k= 1 

In  this  form,  we  recognize  that  the  change-of-basis  matrix  CB,c  = ^b’c  (Defini- 
tion CBM)  is  an  upper  triangular  matrix.  By  Theorem  SCB  we  have 

Mc,c  = Cb,cMbBCb]c 

The  inverse  of  an  upper  triangular  matrix  is  upper  triangular  (Theorem  ITMT),  and 
the  product  of  two  upper  triangular  matrices  is  again  upper  triangular  (Theorem 
PTMT).  So  Mq  c is  an  upper  triangular  matrix. 

Now,  multiply  each  vector  of  C by  a nonzero  scalar,  so  that  the  result  has  norm 
1.  In  this  way  we  create  a new  basis  D which  is  an  orthonormal  set  (Definition  ONS). 
Note  that  the  change-of-basis  matrix  Cc,d  is  a diagonal  matrix  with  nonzero  entries 
equal  to  the  norms  of  the  vectors  in  C. 

Now  we  can  convert  our  results  into  the  language  of  matrices.  Let  E be  the  basis 
of  Cn  formed  with  the  standard  unit  vectors  (Definition  SUV).  Then  the  matrix 
representation  of  S relative  to  E is  simply  A,  A = Mj  E.  The  change-of-basis  matrix 
Cd,e  has  columns  that  are  simply  the  vectors  in  D,  the  orthonormal  basis.  As  such, 
Theorem  CUMOS  tells  us  that  Cd,e  is  a unitary  matrix,  and  by  Definition  UM  has 
an  inverse  equal  to  its  adjoint.  Write  U = Cd,e ■ We  have 


U*AU  = U~1AU 

Theorem  UMI 

= C d]eMe  ,eC D ,E 

= ^D,D 

Theorem  SCB 

= ^g,dAIqC^c]d 

Theorem  SCB 

The  inverse  of  a diagonal  matrix  is  also  a diagonal  matrix,  and  so  this  final 
expression  is  the  product  of  three  upper  triangular  matrices,  and  so  is  again  upper 
triangular  (Theorem  PTMT).  Thus  the  desired  upper  triangular  matrix,  T,  is  the 
matrix  representation  of  S relative  to  the  orthonormal  basis  D,  Mj)  D.  ■ 

Subsection  NM 
Normal  Matrices 

Normal  matrices  comprise  a broad  class  of  interesting  matrices,  many  of  which  we 
have  met  already.  But  they  are  most  interesting  since  they  define  exactly  which 
matrices  we  can  diagonalize  via  a unitary  matrix.  This  is  the  upcoming  Theorem 
OD.  Here  is  the  definition. 
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Definition  NRML  Normal  Matrix 

The  square  matrix  A is  normal  if  A*  A = AA*.  □ 

So  a normal  matrix  commutes  with  its  adjoint.  Part  of  the  beauty  of  this  definition 
is  that  it  includes  many  other  types  of  matrices.  A diagonal  matrix  will  commute 
with  its  adjoint,  since  the  adjoint  is  again  diagonal  and  the  entries  are  just  conjugates 
of  the  entries  of  the  original  diagonal  matrix.  A Hermitian  (self-adjoint)  matrix 
(Definition  HM)  will  trivially  commute  with  its  adjoint,  since  the  two  matrices  are 
the  same.  A real,  symmetric  matrix  is  Hermitian,  so  these  matrices  are  also  normal. 
A unitary  matrix  (Definition  UM)  has  its  adjoint  as  its  inverse,  and  inverses  commute 
(Theorem  OSIS),  so  unitary  matrices  are  normal.  Another  class  of  normal  matrices  is 
the  skew-symmetric  matrices.  However,  these  broad  descriptions  still  do  not  capture 
all  of  the  normal  matrices,  as  the  next  example  shows. 

Example  ANM  A normal  matrix 
Let 


Then 


1 

1 

2 

0 

1 

1 

-1 

1 

0 

2 

-1 

1 

so  we  see  by  Definition  NRML  that  A is  normal.  However,  A is  not  symmetric  (hence, 
as  a real  matrix,  not  Hermitian),  not  unitary,  and  not  skew-symmetric.  A 

Subsection  OD 
Orthonormal  Diagonalization 

A diagonal  matrix  is  very  easy  to  work  with  in  matrix  multiplication  (Example 
HPDM)  and  an  orthonormal  basis  also  has  many  advantages  (Theorem  COB).  How 
about  converting  a matrix  to  a diagonal  matrix  through  a similarity  transformation 
using  a unitary  matrix  (i.e.  build  a diagonal  matrix  representation  with  an  orthonor- 
mal matrix)?  That’d  be  fantastic!  When  can  we  do  this?  We  can  always  accomplish 
this  feat  when  the  matrix  is  normal,  and  normal  matrices  are  the  only  ones  that 
behave  this  way.  Here  is  the  theorem. 

Theorem  OD  Orthonormal  Diagonalization 

Suppose  that  A is  a square  matrix.  Then  there  is  a unitary  matrix  U and  a diagonal 
matrix  D,  with  diagonal  entries  equal  to  the  eigenvalues  of  A,  such  that  U*AU  = D 
if  and  only  if  A is  a normal  matrix. 

Proof.  (=>)  Suppose  there  is  a unitary  matrix  U that  diagonalizes  A.  We  would 
usally  write  this  condition  as  U*AU  = D , but  we  will  find  it  convenient  in  this  part 
of  the  proof  to  use  our  hypothesis  in  the  equivalent  form,  A = UDU*.  Recall  that  a 
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diagonal  matrix  is  normal,  and  notice  that  this  observation  is  at  the  center  of  the 
next  sequence  of  equalities.  We  check  the  normality  of  A , 


A* A = ( UDU *)*  ( UDU *) 

Hypothesis 

= ([/*)*  D*U* UDU* 

Theorem  MMAD 

= UD*U*UDU* 

Theorem  AA 

= U D*  InDU* 

Definition  UM 

= UD*DU* 

Theorem  MMIM 

= UDD*U* 

Definition  NRML 

= UDInD*U* 

Theorem  MMIM 

= UDU*UD*U* 

Definition  UM 

= UDU * (U*)*  D*U* 

Theorem  AA 

= (UDU*)  (UDU*)* 

Theorem  MMAD 

= AA* 

Hypothesis 

So  by  Definition  NRML,  A is  a normal  matrix. 

(<=)  For  the  converse,  suppose  that  A is  a normal  matrix.  Whether  or  not  A is 
normal,  Theorem  OBUTR  provides  a unitary  matrix  U and  an  upper  triangular 
matrix  T,  whose  diagonal  entries  are  the  eigenvalues  of  A,  and  such  that  U*  AU  = T. 
With  the  added  condition  that  A is  normal,  we  will  determine  that  the  entries  of  T 

above  the  diagonal  must  be  all  zero.  Here  we  go. 

First  notice  that  Definition  UM  implies  that  the 

inverse  of  a unitary  matrix  U is 

the  adjoint,  U*,  so  the  product  of  these  two  matrices,  in  either  order,  is  the  identity 
matrix  (Theorem  OSIS).  We  begin  by  showing  that  T is  normal. 

T*T  = (U*  AU)*  (U*  AU) 

Theorem  OBUTR 

= U* A*  (U*)*U*AU 

Theorem  MMAD 

= U*  A*UU*  AU 

Theorem  A A 

= U*  A*  InAU 

Definition  UM 

= U*  A*  AU 

Theorem  MMIM 

= U*  A A*  U 

Definition  NRML 

= U*  AIn  A*  U 

Theorem  MMIM 

= U*  AUU*  A*U 

Definition  UM 

= U* AUU* A*  (U*)* 

Theorem  AA 

= (U*AU)  (U*  AU)* 

Theorem  MMAD 

rj-irj-i* 

Theorem  OBUTR 

So  by  Definition  NRML,  T is  a normal  matrix. 
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We  can  translate  the  normality  of  T into  the  statement  TT*  —T*T  = O.  We 
now  establish  an  equality  we  will  use  repeatedly.  For  1 < i < n, 


[0]u 

Definition  ZM 

Definition  NRML 

[TT%  - [T*T]U 

Definition  MA 

Em**  i nfei-E[T*kPifei 

k—1  k— 1 

n n 

Theorem  EMP 

EtT]^ 

/c=i  fc=i 

Definition  A 

EtT]^  - E pi* * 

fe=l 

n i 

Definition  UTM 

Ei[Tki2-EnTLi2 

k=i  k—1 

Definition  MCN 

To  conclude,  we  use  the  above  equality  repeatedly,  beginning  with  i = 1,  and 
discover,  row  by  row,  that  the  entries  above  the  diagonal  of  T are  all  zero.  The  key 
observation  is  that  a sum  of  squares  can  only  equal  zero  when  each  term  of  the  sum 
is  zero.  For  i = 1 we  have 

n 1 n 

o = Ei[T]ifci2-Ei[T]feii2  = Ei[T]ifci2 

k=  1 k= 1 k= 2 

which  forces  the  conclusions 

pii2  = o m13  = o m14  = o •••  [T]ln  = o 

For  i = 2 we  use  the  same  equality,  but  also  incorporate  the  portion  of  the  above 
conclusions  that  says  [T]12  = 0, 

n 2 n 2 n 

o = EiP^i2  - E ipi*2i2  - EiP^i2  - E \itu2  = E!^/ 

k—2  k=l  k= 2 k—2  k= 3 

which  forces  the  conclusions 

[T}23  = 0 m24  = 0 [T]25  = 0 [T]2n  = 0 

We  can  repeat  this  process  for  the  subsequent  values  of*  = 3,  4,  5 . . . , n — 1. 
Notice  that  it  is  critical  we  do  this  in  order,  since  we  need  to  employ  portions  of  each 
of  the  previous  conclusions  about  rows  having  zero  entries  in  order  to  successfully  get 
the  same  conclusion  for  later  rows.  Eventually,  we  conclude  that  all  of  the  nondiagonal 
entries  of  T are  zero,  so  the  extra  assumption  of  normality  forces  T to  be  diagonal. ■ 
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We  can  rearrange  the  conclusion  of  this  theorem  to  read  A = UDU* . Recall  that 
a unitary  matrix  can  be  viewed  as  a geometry-preserving  transformation  (isometry), 
or  more  loosely  as  a rotation  of  sorts.  Then  a matrix-vector  product,  Ax,  can  be 
viewed  instead  as  a sequence  of  three  transformations.  U*  is  unitary,  and  so  is  a 
rotation.  Since  D is  diagonal,  it  just  multiplies  each  entry  of  a vector  by  a scalar. 
Diagonal  entries  that  are  positive  or  negative,  with  absolute  values  bigger  or  smaller 
than  1 evoke  descriptions  like  reflection,  expansion  and  contraction.  Generally  we 
can  say  that  D “stretches”  a vector  in  each  component.  Final  multiplication  by  U 
undoes  (inverts)  the  rotation  performed  by  17*.  So  a normal  matrix  is  a rotation- 
stretch-  rot  at  ion  transfer  mat  ion . 

The  orthonormal  basis  formed  from  the  columns  of  U can  be  viewed  as  a system 
of  mutually  perpendicular  axes.  The  rotation  by  U*  allows  the  transformation  by  A 
to  be  replaced  by  the  simple  transformation  D along  these  axes,  and  then  D brings 
the  result  back  to  the  original  coordinate  system.  For  this  reason  Theorem  OD  is 
known  as  the  Principal  Axis  Theorem. 

The  columns  of  the  unitary  matrix  in  Theorem  OD  create  an  especially  nice 
basis  for  use  with  the  normal  matrix.  We  record  this  observation  as  a theorem. 

Theorem  OBNM  Orthonormal  Bases  and  Normal  Matrices 

Suppose  that  A is  a normal  matrix  of  size  n.  Then  there  is  an  orthonormal  basis  of 

Cn  composed  of  eigenvectors  of  A. 

Proof.  Let  U be  the  unitary  matrix  promised  by  Theorem  OD  and  let  D be  the 
resulting  diagonal  matrix.  The  desired  set  of  vectors  is  formed  by  collecting  the 
columns  of  U into  a set.  Theorem  CUMOS  says  this  set  of  columns  is  orthonormal. 
Since  U is  nonsingular  (Theorem  UMI),  Theorem  CNMB  says  the  set  is  a basis. 

Since  A is  diagonalized  by  17,  the  diagonal  entries  of  the  matrix  D are  the 
eigenvalues  of  A.  An  argument  exactly  like  the  second  half  of  the  proof  of  Theorem 
DC  shows  that  each  vector  of  the  basis  is  an  eigenvector  of  A.  ■ 

In  a vague  way  Theorem  OBNM  is  an  improvement  on  Theorem  HMOE  which 
said  that  eigenvectors  of  a Hermitian  matrix  for  different  eigenvalues  are  always 
orthogonal.  Hermitian  matrices  are  normal  and  we  see  that  we  can  find  at  least 
one  basis  where  every  pair  of  eigenvectors  is  orthogonal.  Notice  that  this  is  not  a 
generalization,  since  Theorem  HMOE  states  a weak  result  which  applies  to  many 
(but  not  all)  pairs  of  eigenvectors,  while  Theorem  OBNM  is  a seemingly  stronger 
result,  but  only  asserts  that  there  is  one  collection  of  eigenvectors  with  the  stronger 
property. 

Given  annxn  matrix  A,  an  orthonormal  basis  for  Cn,  comprised  of  eigenvectors 
of  A is  an  extremely  useful  basis  to  have  at  the  service  of  the  matrix  A.  Why  do 
we  say  this?  We  can  consider  the  vectors  of  a basis  as  a preferred  set  of  directions, 
known  as  “axes,”  which  taken  together  might  also  be  called  a “coordinate  system.” 
The  standard  basis  of  Definition  SUV  could  be  considered  the  default,  or  prototype, 
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coordinate  system.  When  a basis  is  orthornormal,  we  can  consider  the  directions  to 
be  standardized  to  have  unit  length,  and  we  can  consider  the  axes  as  being  mutually 
perpendicular.  But  there  is  more  let  us  be  a bit  more  formal. 

Suppose  U is  a matrix  whose  columns  are  an  orthonormal  basis  of  eigenvectors 
of  the  n x n matrix  A.  So,  in  particular  U is  a unitary  matrix  (Theorem  CUMOS). 
For  a vector  x € Cn,  use  the  notation  x for  the  vector  representation  of  x relative 
to  the  orthonormal  basis.  So  the  entries  of  x,  used  in  a linear  combination  of  the 
columns  of  U will  create  x.  With  Definition  MVP,  we  can  write  this  relationship  as 

Ux  = x 

Since  U*  is  the  inverse  of  U (Definition  UM),  we  can  rearrange  this  equation  as 

x = U*x 

This  says  we  can  easily  create  the  vector  representation  relative  to  the  orthonormal 
basis  with  a matrix-vector  product  of  the  adjoint  of  U . Note  that  the  adjoint  is  much 
easier  to  compute  than  a matrix  inverse,  which  would  be  one  general  way  to  obtain 
a vector  representation.  This  is  our  first  observation  about  coordinatization  relative 
to  orthonormal  basis.  However,  we  already  knew  this,  as  we  just  have  Theorem  COB 
in  disguise  (see  Exercise  OD.T20). 

We  also  know  that  orthonormal  bases  play  nicely  with  inner  products.  Theorem 
UMPIP  says  unitary  matrices  preserve  inner  products  (and  hence  norms).  More 
geometrically,  lengths  and  angles  are  preserved  by  multiplication  by  a unitary  matrix. 
Using  our  notation,  this  becomes 

(x,  y)  = (C/x,  Uy)  = (x,  y> 

So  we  can  compute  inner  products  with  the  original  vectors,  or  with  their  represen- 
tations, and  obtain  the  same  result.  It  follows  that  norms,  lengths,  and  angles  can 
all  be  computed  with  the  original  vectors  or  with  the  representations  in  the  new 
coordinate  system  based  on  an  orthonormal  basis. 

So  far  we  have  not  really  said  anything  new,  nor  has  the  matrix  A,  or  its  eigen- 
vectors, come  into  play.  We  know  that  a matrix  is  really  a linear  transformation,  so 
we  express  this  view  of  a matrix  as  a function  by  writing  generically  that  Ax  = y. 
The  matrix  U will  diagonalize  A,  creating  the  diagonal  matrix  D with  diagonal 
entries  equal  to  the  eigenvalues  of  A.  We  can  write  this  as  U*  AU  = D and  convert 
to  U*A  = DU*.  Then  we  have 

y = U*  y = U*  Ax  = DU*x  = Dx 

So  with  the  coordinatized  vectors,  the  transformation  by  the  matrix  A can  be 
accomplished  with  multiplication  by  a diagonal  matrix  D.  A moment’s  thought 
should  convince  you  that  a matrix- vector  product  with  a diagonal  matrix  is  exeedingly 
simple  computationally.  Geometrically,  this  is  simply  stretching,  contracting  and/or 
reflecting  in  the  direction  of  each  basis  vector  (“axis”).  And  the  multiples  used  for 
these  changes  are  the  diagonal  entries  of  the  diagonal  matrix,  the  eigenvalues  of  A. 
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So  the  new  coordinate  system  (provided  by  the  orthonormal  basis  of  eigenvectors) 
is  a collection  of  mutually  perpendicular  unit  vectors  where  inner  products  are 
preserved,  and  the  action  of  the  matrix  A is  described  by  multiples  (eigenvalues)  of 
the  entries  of  the  coordinatized  versions  of  the  vectors.  Nice. 

Reading  Questions 

1.  Name  three  broad  classes  of  normal  matrices  that  we  have  studied  previously.  No  set 
that  you  give  should  be  a subset  of  another  on  your  list. 

2.  Compare  and  contrast  Theorem  UTMR  with  Theorem  OD. 

3.  Given  an  n x n matrix  A,  why  would  you  desire  an  orthonormal  basis  of  Cn  composed 
entirely  of  eigenvectors  of  A? 

Exercises 

TIO  Exercise  MM.T35  asked  you  to  show  that  AA*  is  Hermitian.  Prove  directly  that 
AA*  is  a normal  matrix. 

T20  In  the  discussion  following  Theorem  OBNM  we  comment  that  the  equation  x = [7*x 
is  just  Theorem  COB  in  disguise.  Formulate  this  observation  more  formally  and  prove  the 
equivalence. 

T30  For  the  proof  of  Theorem  PTMT  we  only  show  that  the  product  of  two  lower 
triangular  matrices  is  again  lower  triangular.  Provide  a proof  that  the  product  of  two  upper 
triangular  matrices  is  again  upper  triangular.  Look  to  the  proof  of  Theorem  PTMT  for 
guidance  if  you  need  a hint. 


Chapter  P 
Preliminaries 


“Preliminaries”  are  basic  mathematical  concepts  you  are  likely  to  have  seen  before. 
So  we  have  collected  them  near  the  end  as  reference  material  (despite  the  name). 
Head  back  here  when  you  need  a refresher,  or  when  a theorem  or  exercise  builds  on 
some  of  this  basic  material. 


Section  CNO 

Complex  Number  Operations 

In  this  section  we  review  some  of  the  basics  of  working  with  complex  numbers. 

Subsection  CNA 

Arithmetic  with  complex  numbers 

A complex  number  is  a linear  combination  of  1 and  * = \/— 1,  typically  written  in 
the  form  a + bi.  Complex  numbers  can  be  added,  subtracted,  multiplied  and  divided, 
just  like  we  are  used  to  doing  with  real  numbers,  including  the  restriction  on  division 
by  zero.  We  will  not  define  these  operations  carefully  immediately,  but  instead  first 
illustrate  with  examples. 

Example  ACN  Arithmetic  of  complex  numbers 

(2  + 5*)  + (6  - 4 *)  = (2  + 6)  + (5  + (-4))*  = 8 + * 

(2  + 5 *)  - (6  - 4 *)  = (2  - 6)  + (5  - (-4))*  = -4  + 9 * 

(2  + 5*)(6  - 4 *)  = (2)(6)  + (5*)(6)  + (2)(-4*)  + (5*)(-4*)  = 12  + 30*  - 8*  - 20*2 
= 12  + 22*  - 20(— 1)  = 32  + 22* 
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Division  takes  just  a bit  more  care.  We  multiply  the  denominator  by  a complex 
number  chosen  to  produce  a real  number  and  then  we  can  produce  a complex  number 
as  a result. 

2 + 5 * 2 + 5*  6 + 4*  -8  + 38*  8 38.  2 19  . 

6-4*  “ 6 — 4*  6 + 4*  “ 52  ~ ~52  + 52*  ~ _13  + 26* 

A 

In  this  example,  we  used  6 + 4*  to  convert  the  denominator  in  the  fraction  to  a 
real  number.  This  number  is  known  as  the  conjugate,  which  we  define  in  the  next 
section. 

We  will  often  exploit  the  basic  properties  of  complex  number  addition,  subtraction, 
multiplication  and  division,  so  we  will  carefully  define  the  two  basic  operations, 
together  with  a definition  of  equality,  and  then  collect  nine  basic  properties  in  a 
theorem. 

Definition  CNE  Complex  Number  Equality 

The  complex  numbers  a = a + bi  and  /?  = c + di  arc  equal,  denoted  a = (3,  if  a = c 
and  b = d.  □ 

Definition  CNA  Complex  Number  Addition 

The  sum  of  the  complex  numbers  a = a + bi  and  /3  = c + di  , denoted  a + /?,  is 
(a  + c)  + (b  + d)i.  □ 

Definition  CNM  Complex  Number  Multiplication 

The  product  of  the  complex  numbers  a = a + bi  and  /3  = c + di  , denoted  a/3,  is 
(ac  — bd)  + (ad  + bc)i.  □ 

Theorem  PCNA  Properties  of  Complex  Number  Arithmetic 

The  operations  of  addition  and  multiplication  of  complex  numbers  have  the  following 

properties. 

• ACCN  Additive  Closure,  Complex  Numbers 
If  a,  /3  £ C,  then  a + j3  £ C. 

• MCCN  Multiplicative  Closure,  Complex  Numbers 
If  a,  /3  £ C,  then  a/3  £ C. 

• CACN  Commutativity  of  Addition,  Complex  Numbers 
For  any  a,  f3  £ C,  a + /3  = /3  + a. 

• CMCN  Commutativity  of  Multiplication,  Complex  Numbers 
For  any  a,  /3  £ C,  a/3  = /3a. 

• AACN  Additive  Associativity,  Complex  Numbers 
For  any  a,  /3,  "f  £ C,  a + (/3  + 7)  = (a  + /3)  + 7. 
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• MACN  Multiplicative  Associativity,  Complex  Numbers 
For  any  a,  /3,  7 £ C,  a (/3y)  = (a/3)  7. 

• DCN  Distributivity,  Complex  Numbers 
For  any  a,  (3,  7 G C,  a(/3  + 7)  = a/3  + 07. 

• ZCN  Zero,  Complex  Numbers 

There  is  a complex  number  0 = 0 + 0*  so  that  for  any  a £ C,  0 + a = a. 

• OCN  One,  Complex  Numbers 

There  is  a complex  number  1 = 1 + 0*  so  that  for  any  a € C,  la  = a. 

• AICN  Additive  Inverse,  Complex  Numbers 

For  every  a € C there  exists  —a  € C so  that  a + (—a)  = 0. 

• MICN  Multiplicative  Inverse,  Complex  Numbers 

For  every  a € C,  a/0  there  exists  ^ £ C so  that  a (^)  = 1. 

Proof.  We  could  derive  each  of  these  properties  of  complex  numbers  with  a proof 
that  builds  on  the  identical  properties  of  the  real  numbers.  The  only  proof  that 
might  be  at  all  interesting  would  be  to  show  Property  MICN  since  we  would  need 
to  trot  out  a conjugate.  For  this  property,  and  especially  for  the  others,  we  might  be 
tempted  to  construct  proofs  of  the  identical  properties  for  the  reals.  This  would  take 
us  way  too  far  afield,  so  we  will  draw  a line  in  the  sand  right  here  and  just  agree 
that  these  nine  fundamental  behaviors  are  true.  OK? 

Mostly  we  have  stated  these  nine  properties  carefully  so  that  we  can  make 
reference  to  them  later  in  other  proofs.  So  we  will  be  linking  back  here  often.  ■ 

Zero  and  one  play  special  roles,  of  course,  and  especially  zero.  Our  first  result  is 
one  we  take  for  granted,  but  it  requires  a proof,  derived  from  our  nine  properties. 
You  can  compare  it  to  its  vector  space  counterparts,  Theorem  ZSSM  and  Theorem 
ZVSM. 

Theorem  ZPCN  Zero  Product,  Complex  Numbers 
Suppose  a £ C.  Then  0a  = 0. 

Proof. 


0a  = 0a  + 0 


= 0a  + (0a  — (0a)) 
= (0a  + 0a)  — (0a) 
= (0  + 0)  a - (0a) 
= 0a  — (0a) 


Property  ZCN 
Property  AICN 
Property  AACN 
Property  DCN 
Property  ZCN 
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= 0 


Property  AICN 


Our  next  theorem  could  be  called  “cancellation” , since  it  will  make  that  possible. 
Though  you  will  never  see  us  drawing  slashes  through  parts  of  products.  We  will 
also  make  very  limited  use  of  this  result,  or  its  vector  space  counterpart,  Theorem 
SMEZV. 

Theorem  ZPZT  Zero  Product,  Zero  Terms 

Suppose  a,  /3  £ C.  Then  a/3  = 0 if  and  only  if  at  least  one  of  a = 0 or  p = 0. 


Proof.  (=>)We  conduct  the  forward  argument  in  two  cases.  First  suppose  that  a = 0. 
Then  we  are  done.  (That  was  easy.) 

For  the  second  case,  suppose  now  that  q/0.  Then 


P=IP 

= (-a 


= - (a/3) 

= -0 
a 

= 0 


Property  OCN 
Property  MICN 

Property  MACN 

Hypothesis 
Theorem  ZPCN 


(<t=)With  two  applications  of  Theorem  ZPCN  it  is  easy  to  see  that  if  one  of  the 
scalars  is  zero,  then  so  is  the  product.  ■ 

As  an  equivalence  (Proof  Technique  E),  we  could  restate  this  result  as  the 
contrapositive  (Proof  Technique  CP)  by  negating  each  statement,  so  it  would  read 
“ a/3  ^ 0 if  and  only  if  a ^ 0 and  /3  ^ 0.”  After  you  have  learned  more  about 
nonsingular  matrices  and  matrix  multiplication,  you  should  compare  this  result  with 
Theorem  NPNT. 


Subsection  CCN 

Conjugates  of  Complex  Numbers 

Definition  CCN  Conjugate  of  a Complex  Number 

The  conjugate  of  the  complex  number  a = a + bi  £ C is  the  complex  number 
a = a — bi.  □ 

Example  CSCN  Conjugate  of  some  complex  numbers 


2 + 3?  = 2 - 3i  5 - 4i  = 5 + 4i  -3  + Oi  = -3  + 0*  0 + 0*  = 0 + 0/ 

A 
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Notice  how  the  conjugate  of  a real  number  leaves  the  number  unchanged.  The 
conjugate  enjoys  some  basic  properties  that  are  useful  when  we  work  with  linear 
expressions  involving  addition  and  multiplication. 

Theorem  CCRA  Complex  Conjugation  Respects  Addition 
Suppose  that  a and  (3  are  complex  numbers.  Then  a + /3  = a + (3. 

Proof.  Let  a = a + bi  and  [3  = r + si.  Then 

a + (3  = (a  + r)  + (b  + s)i  = (a  + r)  — (b  + s)i  = (a  — bi)  + (r  — si)  = a + (3 


Theorem  CCRM  Complex  Conjugation  Respects  Multiplication 
Suppose  that  a and  f3  are  complex  numbers.  Then  a(3  = a/3. 

Proof.  Let  a = a + bi  and  f3  = r + si.  Then 

a/3  = ( ai — bs)  + (as  + br)i  = ( ar  — bs)  — (as  + br)i 

= (ar  — (—b)(—s))  + (a(—s)  + (—b)r)i  = (a  — bi)(r  — si)  = a(3 


Theorem  CCT  Complex  Conjugation  Twice 
Suppose  that  a is  a complex  number.  Then  a = a. 

Proof.  Let  a = a + bi.  Then 

a = a — bi  = a — (—bi)  = a + bi  = a 


Subsection  MCN 

Modulus  of  a Complex  Number 

We  define  one  more  operation  with  complex  numbers  that  may  be  new  to  you. 
Definition  MCN  Modulus  of  a Complex  Number 

The  modulus  of  the  complex  number  a = a + bi  € C,  is  the  nonnegative  real  number 

|a|  = \/sa  = sj  a2  + b2 . 

□ 


Example  MSCN  Modulus  of  some  complex  numbers 


|2  + 3z|  = \/l3  |5  — 4*|  = x/41  |-3  + 0i|=3  |0  + 0i|=0 

A 
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The  modulus  can  be  interpreted  as  a version  of  the  absolute  value  for  complex 
numbers,  as  is  suggested  by  the  notation  employed.  You  can  see  this  in  how  |— 3|  = 
|— 3 + 0i|  = 3.  Notice  too  how  the  modulus  of  the  complex  zero,  0 + 0*,  has  value  0. 


Section  SET 
Sets 


We  will  frequently  work  carefully  with  sets,  so  the  material  in  this  review  section 
is  very  important.  If  these  topics  are  new  to  you,  study  this  section  carefully  and 
consider  consulting  another  text  for  a more  comprehensive  introduction. 

Subsection  SET 
Sets 

Definition  SET  Set 

A set  is  an  unordered  collection  of  objects.  If  S'  is  a set  and  x is  an  object  that  is 
in  the  set  S,  we  write  x G A.  If  x is  not  in  S,  then  we  write  x A.  We  refer  to  the 
objects  in  a set  as  its  elements.  □ 

Hard  to  get  much  more  basic  than  that.  Notice  that  the  objects  in  a set  can  be 
anything , and  there  is  no  notion  of  order  among  the  elements  of  the  set.  A set  can  be 
finite  as  well  as  infinite.  A set  can  contain  other  sets  as  its  objects.  At  a primitive 
level,  a set  is  just  a way  to  break  up  some  class  of  objects  into  two  groupings:  those 
objects  in  the  set,  and  those  objects  not  in  the  set. 

Example  SETM  Set  membership 

From  the  set  of  all  possible  symbols,  construct  the  following  set  of  three  symbols, 

S = {U,  ♦,  ★} 

Then  the  statement  ■ G A is  true,  while  the  statement  A G A is  false.  However,  then 
the  statement  A ^ S is  true.  A 

A portion  of  a set  is  known  as  a subset.  Notice  how  the  following  definition  uses 
an  implication  (if  whenever. . . then. . . ).  Note  too  how  the  definition  of  a subset  relies 
on  the  definition  of  a set  through  the  idea  of  set  membership. 

Definition  SSET  Subset 

If  S and  T are  two  sets,  then  S is  a subset  of  T,  written  ACT  if  whenever  x G A 
then  x G T.  □ 

If  we  want  to  disallow  the  possibility  that  A is  the  same  as  T,  we  use  the  notation 
ACT  and  we  say  that  A is  a proper  subset  of  T.  We  will  do  an  example,  but  first 
we  will  define  a special  set. 

Definition  ES  Empty  Set 

The  empty  set  is  the  set  with  no  elements.  It  is  denoted  by  0.  □ 

Example  SSET  Subset 
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If  5 = {■,  ♦,  ★},  T = {★,  ♦},  R = {A,  ★},  then 

TCS  R%T  0 C S 

TcS  SCS  S<£S 

A 

What  does  it  mean  for  two  sets  to  be  equal?  They  must  be  the  same.  Well,  that 
explanation  is  not  really  too  helpful,  is  it?  How  about:  If  A C B and  B C A,  then 
A equals  B.  This  gives  us  something  to  work  with,  if  A is  a subset  of  B,  and  vice 
versa , then  they  must  really  be  the  same  set.  We  will  now  make  the  symbol  “=”  do 
double-duty  and  extend  its  use  to  statements  like  A = B,  where  A and  B are  sets. 
Here  is  the  definition,  which  we  will  reference  often. 

Definition  SE  Set  Equality 

Two  sets,  S and  T,  are  equal,  if  S C T and  T C S.  In  this  case,  we  write  S = T . □ 

Sets  are  typically  written  inside  of  braces,  as  { },  as  we  have  seen  above.  However, 
when  sets  have  more  than  a few  elements,  a description  will  typically  have  two 
components.  The  first  is  a description  of  the  general  type  of  objects  contained  in  a 
set,  while  the  second  is  some  sort  of  restriction  on  the  properties  the  objects  have. 
Every  object  in  the  set  must  be  of  the  type  described  in  the  first  part  and  it  must 
satisfy  the  restrictions  in  the  second  part.  Conversely,  any  object  of  the  proper  type 
for  the  first  part,  that  also  meets  the  conditions  of  the  second  part,  will  be  in  the 
set.  These  two  parts  are  set  off  from  each  other  somehow,  often  with  a vertical  bar 
(|)  or  a colon  (:). 

I like  to  think  of  sets  as  clubs.  The  first  part  is  some  description  of  the  type  of 
people  who  might  belong  to  the  club,  the  basic  objects.  For  example,  a bicycle  club 
would  describe  its  members  as  being  people  who  like  to  ride  bicycles.  The  second 
part  is  like  a membership  committee,  it  restricts  the  people  who  are  allowed  in  the 
club.  Continuing  with  our  bicycle  club  analogy,  we  might  decide  to  limit  ourselves 
to  “serious”  riders  and  only  have  members  who  can  document  having  ridden  100 
kilometers  or  more  in  a single  day  at  least  one  time. 

The  restrictions  on  membership  can  migrate  around  some  between  the  first  and 
second  part,  and  there  may  be  several  ways  to  describe  the  same  set  of  objects.  Here 
is  a more  mathematical  example,  employing  the  set  of  all  integers,  Z,  to  describe 
the  set  of  even  integers. 

E = { x € Z\  x is  &n  even  number} 

= { x G Z|  2 divides  x evenly} 

= { 2k\  k G Z} 

Notice  how  this  set  tells  us  that  its  objects  are  integer  numbers  (not,  say,  matrices 
or  functions,  for  example)  and  just  those  that  are  even.  So  we  can  write  that  10  G E, 
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while  17  ^ E once  we  check  the  membership  criteria.  We  also  recognize  the  question 


1 

2 


-3 

0 


5 

3 


G El 


as  being  simply  ridiculous. 


Subsection  SC 
Set  Cardinality 

On  occasion,  we  will  be  interested  in  the  number  of  elements  in  a finite  set.  Here  is 
the  definition  and  the  associated  notation. 

Definition  C Cardinality 

Suppose  S'  is  a finite  set.  Then  the  number  of  elements  in  S is  called  the  cardinality 
or  size  of  S,  and  is  denoted  |S|.  □ 

Example  CS  Cardinality  and  Size 

If  S = then  \S\  = 3.  A 


Subsection  SO 
Set  Operations 


In  this  subsection  we  define  and  illustrate  the  three  most  common  basic  ways  to 
manipulate  sets  to  create  other  sets.  Since  much  of  linear  algebra  is  about  sets,  we 
will  use  these  often. 


Definition  SU  Set  Union 

Suppose  S and  T are  sets.  Then  the  union  of  S and  T , denoted  S LIT,  is  the  set 
whose  elements  are  those  that  are  elements  of  S or  of  T,  or  both.  More  formally, 


x G S U T if  and  only  if  x G S or  x G T 


□ 


Notice  that  the  use  of  the  word  “or”  in  this  definition  is  meant  to  be  non- 
exclusive. That  is,  it  allows  for  x to  be  an  element  of  both  S and  T and  still  qualify 
for  membership  in  S U T. 

Example  SU  Set  union 

If  S = {♦,  ★,  ■}  and  T = {♦,  ★,  A}  then  5 U T = {♦,  ★,  ■ A}.  A 

Definition  SI  Set  Intersection 

Suppose  S and  T are  sets.  Then  the  intersection  of  S and  T,  denoted  S (~l  T,  is  the 
set  whose  elements  are  only  those  that  are  elements  of  S and  of  T.  More  formally, 

x G S fi  T if  and  only  if  x G S and  x G T 


□ 
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Example  SI  Set  intersection 

If  S = {♦,  ★ , ■}  and  T = {♦,  ★ , A}  then  Sf\T  = {♦,  ★}.  A 

The  union  and  intersection  of  sets  are  operations  that  begin  with  two  sets  and 
produce  a third,  new,  set.  Our  final  operation  is  the  set  complement,  which  we 
usually  think  of  as  an  operation  that  takes  a single  set  and  creates  a second,  new, 
set.  However,  if  you  study  the  definition  carefully,  you  will  see  that  it  needs  to  be 
computed  relative  to  some  “universal”  set. 


Definition  SC  Set  Complement 

Suppose  S'  is  a set  that  is  a subset  of  a universal  set  U . Then  the  complement  of 
S,  denoted  S,  is  the  set  whose  elements  are  those  that  are  elements  of  U and  not 
elements  of  S.  More  formally, 

x € S if  and  only  if  x € U and  x S 


□ 


Notice  that  there  is  nothing  at  all  special  about  the  universal  set.  This  is  simply 
a term  that  suggests  that  U contains  all  of  the  possible  objects  we  are  considering. 
Often  this  set  will  be  clear  from  the  context,  and  we  will  not  think  much  about  it, 
nor  reference  it  in  our  notation.  In  other  cases  (rarely  in  our  work  in  this  course) 
the  exact  nature  of  the  universal  set  must  be  made  explicit,  and  reference  to  it  will 
possibly  be  carried  through  in  our  choice  of  notation. 

Example  SC  Set  complement 

If  U = {♦,  ★,  ■ A}  and  S = {♦,  ★,  ■}  then  S = {a}.  A 

There  are  many  more  natural  operations  that  can  be  performed  on  sets,  such  as 
an  exclusive-or  and  the  symmetric  difference.  Many  of  these  can  be  defined  in  terms 
of  the  union,  intersection  and  complement.  We  will  not  have  much  need  of  them 
in  this  course,  and  so  we  will  not  give  precise  descriptions  here  in  this  preliminary 
section. 

There  is  also  an  interesting  variety  of  basic  results  that  describe  the  interplay 
of  these  operations  with  each  other.  We  mention  just  two  as  an  example,  these  are 
known  as  DeMorgan’s  Laws. 

(SUT)  = SnT 
(SnT)  = SUT 


Besides  having  an  appealing  symmetry,  we  mention  these  two  facts,  since  con- 
structing the  proofs  of  each  is  a useful  exercise  that  will  require  a solid  understanding 
of  all  but  one  of  the  definitions  presented  in  this  section.  Give  it  a try. 


Reference 


Proof  Techniques 

In  this  section  we  collect  many  short  essays  designed  to  help  you  understand  how  to 
read,  understand  and  construct  proofs.  Some  are  very  factual,  while  others  consist 
of  advice.  They  appear  in  the  order  that  they  are  first  needed  (or  advisable)  in  the 
text,  and  are  meant  to  be  self-contained.  So  you  should  not  think  of  reading  through 
this  section  in  one  sitting  as  you  begin  this  course.  But  be  sure  to  head  back  here  for 
a first  reading  whenever  the  text  suggests  it.  Also  think  about  returning  to  browse 
at  various  points  during  the  course,  and  especially  as  you  struggle  with  becoming 
an  accomplished  mathematician  who  is  comfortable  with  the  difficult  process  of 
designing  new  proofs. 

Proof  Technique  D 
Definitions 

A definition  is  a made-up  term,  used  as  a kind  of  shortcut  for  some  typically  more 
complicated  idea.  For  example,  we  say  a whole  number  is  even  as  a shortcut  for 
saying  that  when  we  divide  the  number  by  two  we  get  a remainder  of  zero.  With  a 
precise  definition,  we  can  answer  certain  questions  unambiguously.  For  example,  did 
you  ever  wonder  if  zero  was  an  even  number?  Now  the  answer  should  be  clear  since 
we  have  a precise  definition  of  what  we  mean  by  the  term  even. 

A single  term  might  have  several  possible  definitions.  For  example,  we  could  say 
that  the  whole  number  n is  even  if  there  is  some  whole  number  k such  that  n = 2k. 
We  say  this  is  an  equivalent  definition  since  it  categorizes  even  numbers  the  same 
way  our  first  definition  does. 

Definitions  are  like  two-way  streets  — we  can  use  a definition  to  replace  something 
rather  complicated  by  its  definition  (if  it  fits)  and  we  can  replace  a definition  by  its 
more  complicated  description.  A definition  is  usually  written  as  some  form  of  an 
implication,  such  as  “If  something-nice-happens,  then  blatzo.”  However,  this  also 
means  that  “If  blatzo,  then  something-nice-happens,”  even  though  this  may  not  be 
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formally  stated.  This  is  what  we  mean  when  we  say  a definition  is  a two-way  street 
it  is  really  two  implications,  going  in  opposite  “directions.” 

Anybody  (including  you)  can  make  up  a definition,  so  long  as  it  is  unambiguous, 
but  the  real  test  of  a definition’s  utility  is  whether  or  not  it  is  useful  for  concisely 
describing  interesting  or  frequent  situations.  We  will  try  to  restrict  our  definitions 
to  parts  of  speech  that  are  nouns  (e.g.  “matrix”)  or  adjectives  (e.g.  “nonsingular” 
matrix),  and  so  avoid  definitions  that  are  verbs  or  adverbs.  Therefore  our  definitions 
will  describe  an  object  (noun)  or  a property  of  an  object  (adjective). 

We  will  talk  about  theorems  later  (and  especially  equivalences).  For  now,  be  sure 
not  to  confuse  the  notion  of  a definition  with  that  of  a theorem. 

In  this  book,  we  will  display  every  new  definition  carefully  set-off  from  the  text, 
and  the  term  being  defined  will  be  written  thus:  definition.  Additionally,  there  is  a 
full  list  of  all  the  definitions,  in  order  of  their  appearance  located  in  the  reference 
section  of  the  same  name  (Definitions).  Definitions  are  critical  to  doing  mathematics 
and  proving  theorems,  so  we  have  given  you  lots  of  ways  to  locate  a definition  should 
you  forget  its. . . uh,  well,  . . . definition. 

Can  you  formulate  a precise  definition  for  what  it  means  for  a number  to  be  odd? 
(do  not  just  say  it  is  the  opposite  of  even.  Act  as  if  you  do  not  have  a definition  for 
even  yet.)  Can  you  formulate  your  definition  a second,  equivalent,  way?  Can  you 
employ  your  definition  to  test  an  odd  and  an  even  number  for  “odd- ness”  ? 

Proof  Technique  T 
Theorems 

Higher  mathematics  is  about  understanding  theorems.  Reading  them,  understanding 
them,  applying  them,  proving  them.  Every  theorem  is  a shortcut  — we  prove 
something  in  general,  and  then  whenever  we  find  a specific  instance  covered  by  the 
theorem  we  can  immediately  say  that  we  know  something  else  about  the  situation 
by  applying  the  theorem.  In  many  cases,  this  new  information  can  be  gained  with 
much  less  effort  than  if  we  did  not  know  the  theorem. 

The  first  step  in  understanding  a theorem  is  to  realize  that  the  statement  of 
every  theorem  can  be  rewritten  using  statements  of  the  form  “If  something-happens, 
then  something-else-happens.”  The  “something-happens”  part  is  the  hypothesis 
and  the  “something-else-happens”  is  the  conclusion.  To  understand  a theorem, 
it  helps  to  rewrite  its  statement  using  this  construction.  To  apply  a theorem,  we 
verify  that  “something-happens”  in  a particular  instance  and  immediately  conclude 
that  “something-else-happens.”  To  prove  a theorem,  we  must  argue  based  on  the 
assumption  that  the  hypothesis  is  true,  and  arrive  through  the  process  of  logic  that 
the  conclusion  must  then  also  be  true. 
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Proof  Technique  L 
Language 

Like  any  science,  the  language  of  math  must  be  understood  before  further 
study  can  continue. 


Erin  Wilson,  Student 
September,  2004 

Mathematics  is  a language.  It  is  a way  to  express  complicated  ideas  clearly, 
precisely,  and  unambiguously.  Because  of  this,  it  can  be  difficult  to  read.  Read  slowly, 
and  have  pencil  and  paper  at  hand.  It  will  usually  be  necessary  to  read  something 
several  times.  While  reading  can  be  difficult,  it  is  even  harder  to  speak  mathematics, 
and  so  that  is  the  topic  of  this  technique. 

“Natural”  language,  in  the  present  case  English,  is  fraught  with  ambiguity.  Con- 
sider the  possible  meanings  of  the  sentence:  The  fish  is  ready  to  eat.  One  fish,  or  two 
fish?  Are  the  fish  hungry,  or  will  the  fish  be  eaten?  (See  Exercise  SSLE.M10,  Exercise 
SSLE.M11,  Exercise  SSLE.M12,  Exercise  SSLE.M13.)  In  your  daily  interactions  with 
others,  give  some  thought  to  how  many  mis- understandings  arise  from  the  ambiguity 
of  pronouns,  modifiers  and  objects. 

I am  going  to  suggest  a simple  modification  to  the  way  you  use  language  that 
will  make  it  much,  much  easier  to  become  proficient  at  speaking  mathematics  and 
eventually  it  will  become  second  nature.  Think  of  it  as  a training  aid  or  practice 
drill  you  might  use  when  learning  to  become  skilled  at  a sport. 

First,  eliminate  pronouns  from  your  vocabulary  when  discussing  linear  algebra, 
in  class  or  with  your  colleagues.  Do  not  use:  it,  that,  those,  their  or  similar  sources 
of  confusion.  This  is  the  single  easiest  step  you  can  take  to  make  your  oral  expres- 
sion of  mathematics  clearer  to  others,  and  in  turn,  it  will  greatly  help  your  own 
understanding. 

Now  rid  yourself  of  the  word  “thing”  (or  variants  like  “something”).  When 
you  are  tempted  to  use  this  word  realize  that  there  is  some  object  you  want  to 
discuss,  and  we  likely  have  a definition  for  that  object  (see  the  discussion  at  Proof 
Technique  D).  Always  “think  about  your  objects”  and  many  aspects  of  the  study  of 
mathematics  will  get  easier.  Ask  yourself:  “Am  I working  with  a set,  a number,  a 
function,  an  operation,  a differential  equation,  or  what?”  Knowing  what  an  object 
is  will  allow  you  to  narrow  down  the  procedures  you  may  apply  to  it.  If  you  have 
studied  an  object-oriented  computer  programming  language,  then  you  will  already 
have  experience  identifying  objects  and  thinking  carefully  about  what  procedures 
are  allowed  to  be  applied  to  them. 

Third,  eliminate  the  verb  “works”  (as  in  “the  equation  works”)  from  your 
vocabulary.  This  term  is  used  as  a substitute  when  we  are  not  sure  just  what  we 
are  trying  to  accomplish.  Usually  we  are  trying  to  say  that  some  object  fulfills  some 
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condition.  The  condition  might  even  have  a definition  associated  with  it,  making  it 
even  easier  to  describe. 

Last,  speak  slooooowly  and  thoughtfully  as  you  try  to  get  by  without  all  these 
lazy  words.  It  is  hard  at  first,  but  you  will  get  better  with  practice.  Especially  in  class, 
when  the  pressure  is  on  and  all  eyes  are  on  you,  do  not  succumb  to  the  temptation 
to  use  these  weak  words.  Slow  down,  we’d  all  rather  wait  for  a slow,  well-formed 
question  or  answer  than  a fast,  sloppy,  incomprehensible  one. 

You  will  find  the  improvement  in  your  ability  to  speak  clearly  about  complicated 
ideas  will  greatly  improve  your  ability  to  think  clearly  about  complicated  ideas. 
And  I believe  that  you  cannot  think  clearly  about  complicated  ideas  if  you  cannot 
formulate  questions  or  answers  clearly  in  the  correct  language.  This  is  as  applicable 
to  the  study  of  law,  economics  or  philosophy  as  it  is  to  the  study  of  science  or 
mathematics. 

In  this  spirit,  Dupont  Hubert  has  contributed  the  following  quotation,  which  is 
widely  used  in  French  mathematics  courses  (and  which  might  be  construed  as  the 
contrapositive  of  Proof  Technique  CP) 

Ce  que  Ton  concoit  bien  s’enonce  clairement, 

Et  les  mots  pour  le  dire  arrivent  aisement. 


Nicolas  Boileau 
L’art  poetique 
Chant  I,  1674 

which  translates  as 

Whatever  is  well  conceived  is  clearly  said, 

And  the  words  to  say  it  flow  with  ease. 

So  when  you  come  to  class,  check  your  pronouns  at  the  door,  along  with  other 
weak  words.  And  when  studying  with  friends,  you  might  make  a game  of  catching 
one  another  using  pronouns,  “thing,”  or  “works.”  I know  I’ll  be  calling  you  on  it! 

Proof  Technique  GS 
Getting  Started 

“I  don’t  know  how  to  get  started!”  is  often  the  lament  of  the  novice  proof-builder. 
Here  are  a few  pieces  of  advice. 

1.  As  mentioned  in  Proof  Technique  T,  rewrite  the  statement  of  the  theorem  in 
an  “if-then”  form.  This  will  simplify  identifying  the  hypothesis  and  conclusion, 
which  are  referenced  in  the  next  few  items. 
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2.  Ask  yourself  what  kind  of  statement  you  are  trying  to  prove.  This  is  always 
part  of  your  conclusion.  Are  you  being  asked  to  conclude  that  two  numbers 
are  equal,  that  a function  is  differentiable  or  a set  is  a subset  of  another? 
You  cannot  bring  other  techniques  to  bear  if  you  do  not  know  what  type  of 
conclusion  you  have. 

3.  Write  down  reformulations  of  your  hypotheses.  Interpret  and  translate  each 
definition  properly. 

4.  Write  your  hypothesis  at  the  top  of  a sheet  of  paper  and  your  conclusion  at  the 
bottom.  See  if  you  can  formulate  a statement  that  precedes  the  conclusion  and 
also  implies  it.  Work  down  from  your  hypothesis,  and  up  from  your  conclusion, 
and  see  if  you  can  meet  in  the  middle.  When  you  are  finished,  rewrite  the 
proof  nicely,  from  hypothesis  to  conclusion,  with  verifiable  implications  giving 
each  subsequent  statement. 

5.  As  you  work  through  your  proof,  think  about  what  kinds  of  objects  your 
symbols  represent.  For  example,  suppose  A is  a set  and  f(x)  is  a real- valued 
function.  Then  the  expression  A + f might  make  no  sense  if  we  have  not 
defined  what  it  means  to  “add”  a set  to  a function,  so  we  can  stop  at  that 
point  and  adjust  accordingly.  On  the  other  hand  we  might  understand  2/  to 
be  the  function  whose  rule  is  described  by  (2f)(x)  = 2 f(x).  “Think  about 
your  objects”  means  to  always  verify  that  your  objects  and  operations  are 
compatible. 

Proof  Technique  C 
Constructive  Proofs 

Conclusions  of  proofs  come  in  a variety  of  types.  Often  a theorem  will  simply  assert 
that  something  exists.  The  best  way,  but  not  the  only  way,  to  show  something  exists 
is  to  actually  build  it.  Such  a proof  is  called  constructive.  The  thing  to  realize 
about  constructive  proofs  is  that  the  proof  itself  will  contain  a procedure  that  might 
be  used  computationally  to  construct  the  desired  object.  If  the  procedure  is  not  too 
cumbersome,  then  the  proof  itself  is  as  useful  as  the  statement  of  the  theorem. 

Proof  Technique  E 
Equivalences 

When  a theorem  uses  the  phrase  “if  and  only  if”  (or  the  abbreviation  “iff” ) it  is  a 
shorthand  way  of  saying  that  two  if-then  statements  are  true.  So  if  a theorem  says 
“P  if  and  only  if  Q,”  then  it  is  true  that  “if  P,  then  Q”  while  it  is  also  true  that  “if 
Q,  then  P.”  For  example,  it  may  be  a theorem  that  “I  wear  bright  yellow  knee-high 
plastic  boots  if  and  only  if  it  is  raining.”  This  means  that  I never  forget  to  wear  my 
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super-duper  yellow  boots  when  it  is  raining  and  I would  not  be  seen  in  such  silly 
boots  unless  it  was  raining.  You  never  have  one  without  the  other.  I have  my  boots 
on  and  it  is  raining  or  I do  not  have  my  boots  on  and  it  is  dry. 

The  upshot  for  proving  such  theorems  is  that  it  is  like  a 2-for-l  sale,  we  get  to  do 
two  proofs.  Assume  P and  conclude  Q , then  start  over  and  assume  Q and  conclude 
P.  For  this  reason,  “if  and  only  if”  is  sometimes  abbreviated  by  •<=>•  , while  proofs 
indicate  which  of  the  two  implications  is  being  proved  by  prefacing  each  with  =>  or 
4=.  A carefully  written  proof  will  remind  the  reader  which  statement  is  being  used 
as  the  hypothesis,  a quicker  version  will  let  the  reader  deduce  it  from  the  direction 
of  the  arrow.  Tradition  dictates  we  do  the  “easy”  half  first,  but  that  is  hard  for  a 
student  to  know  until  you  have  finished  doing  both  halves!  Oh  well,  if  you  rewrite 
your  proofs  (a  good  habit),  you  can  then  choose  to  put  the  easy  half  first. 

Theorems  of  this  type  are  called  “equivalences”  or  “characterizations,”  and  they 
are  some  of  the  most  pleasing  results  in  mathematics.  They  say  that  two  objects,  or 
two  situations,  are  really  the  same.  You  do  not  have  one  without  the  other,  like  rain 
and  my  yellow  boots.  The  more  different  P and  Q seem  to  be,  the  more  pleasing 
it  is  to  discover  they  are  really  equivalent.  And  if  P describes  a very  mysterious 
solution  or  involves  a tough  computation,  while  Q is  transparent  or  involves  easy 
computations,  then  we  have  found  a great  shortcut  for  better  understanding  or  faster 
computation.  Remember  that  every  theorem  really  is  a shortcut  in  some  form.  You 
will  also  discover  that  if  proving  P =>  Q is  very  easy,  then  proving  Q =>  P is  likely 
to  be  proportionately  harder.  Sometimes  the  two  halves  are  about  equally  hard.  And 
in  rare  cases,  you  can  string  together  a whole  sequence  of  other  equivalences  to  form 
the  one  you  are  after  and  you  do  not  even  need  to  do  two  halves.  In  this  case,  the 
argument  of  one  half  is  just  the  argument  of  the  other  half,  but  in  reverse. 

One  last  thing  about  equivalences.  If  you  see  a statement  of  a theorem  that  says 
two  things  are  “equivalent,”  translate  it  first  into  an  “if  and  only  if”  statement. 

Proof  Technique  N 
Negation 

When  we  construct  the  contrapositive  of  a theorem  (Proof  Technique  CP),  we  need 
to  negate  the  two  statements  in  the  implication.  And  when  we  construct  a proof 
by  contradiction  (Proof  Technique  CD),  we  need  to  negate  the  conclusion  of  the 
theorem.  One  way  to  construct  a converse  (Proof  Technique  CV)  is  to  simultaneously 
negate  the  hypothesis  and  conclusion  of  an  implication  (but  remember  that  this 
is  not  guaranteed  to  be  a true  statement).  So  we  often  have  the  need  to  negate 
statements,  and  in  some  situations  it  can  be  tricky. 

If  a statement  says  that  a set  is  empty,  then  its  negation  is  the  statement  that 
the  set  is  nonempty.  That  is  straightforward.  Suppose  a statement  says  “something- 
happens”  for  all  i,  or  every  i,  or  any  i.  Then  the  negation  is  that  “something-does- 
not-happen”  for  at  least  one  value  of  i.  If  a statement  says  that  there  exists  at 
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least  one  “thing,”  then  the  negation  is  the  statement  that  there  is  no  “thing.”  If  a 
statement  says  that  a “thing”  is  unique,  then  the  negation  is  that  there  is  zero,  or 
more  than  one,  of  the  “thing.” 

We  are  not  covering  all  of  the  possibilities,  but  we  wish  to  make  the  point  that 
logical  qualifiers  like  “there  exists”  or  “for  every”  must  be  handled  with  care  when 
negating  statements.  Studying  the  proofs  which  employ  contradiction  (as  listed 
in  Proof  Technique  CD)  is  a good  first  step  towards  understanding  the  range  of 
possibilities. 

Proof  Technique  CP 
Contrapositives 

The  contrapositive  of  an  implication  P =>  Q is  the  implication  not(Q)  =>  not(P), 
where  “not”  means  the  logical  negation,  or  opposite.  An  implication  is  true  if  and 
only  if  its  contrapositive  is  true.  In  symbols,  (P  =>  Q)  (not(Q)  =>■  not(P)) 

is  a theorem.  Such  statements  about  logic,  that  are  always  true,  are  known  as 

tautologies. 

For  example,  it  is  a theorem  that  “if  a vehicle  is  a fire  truck,  then  it  has  big 
tires  and  has  a siren.”  (Yes,  I’m  sure  you  can  conjure  up  a counterexample,  but  play 
along  with  me  anyway.)  The  contrapositive  is  “if  a vehicle  does  not  have  big  tires  or 
does  not  have  a siren,  then  it  is  not  a fire  truck.”  Notice  how  the  “and”  became  an 
“or”  when  we  negated  the  conclusion  of  the  original  theorem. 

It  will  frequently  happen  that  it  is  easier  to  construct  a proof  of  the  contrapositive 
than  of  the  original  implication.  If  you  are  having  difficulty  formulating  a proof  of 
some  implication,  see  if  the  contrapositive  is  easier  for  you.  The  trick  is  to  construct 
the  negation  of  complicated  statements  accurately.  More  on  that  later. 

Proof  Technique  CV 
Converses 

The  converse  of  the  implication  P =>  Q is  the  implication  Q =>  P.  There  is  no 
guarantee  that  the  truth  of  these  two  statements  are  related.  In  particular,  if  an 
implication  has  been  proven  to  be  a theorem,  then  do  not  try  to  use  its  converse  too, 
as  if  it  were  a theorem.  Sometimes  the  converse  is  true  (and  we  have  an  equivalence, 
see  Proof  Technique  E).  But  more  likely  the  converse  is  false,  especially  if  it  was  not 
included  in  the  statement  of  the  original  theorem. 

For  example,  we  have  the  theorem,  “if  a vehicle  is  a fire  truck,  then  it  is  has  big 
tires  and  has  a siren.”  The  converse  is  false.  The  statement  that  “if  a vehicle  has  big 
tires  and  a siren,  then  it  is  a fire  truck”  is  false.  A police  vehicle  for  use  on  a sandy 
public  beach  would  have  big  tires  and  a siren,  yet  is  not  equipped  to  fight  fires. 

We  bring  this  up  now,  because  Theorem  CSRN  has  a tempting  converse.  Does 
this  theorem  say  that  if  r < n,  then  the  system  is  consistent?  Definitely  not,  as 
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Archetype  E has  r = 3 < 4 = n,  yet  is  inconsistent.  This  example  is  then  said 
to  be  a counterexample  to  the  converse.  Whenever  you  think  a theorem  that  is 
an  implication  might  actually  be  an  equivalence,  it  is  good  to  hunt  around  for  a 
counterexample  that  shows  the  converse  to  be  false  (the  archetypes,  Archetypes,  can 
be  a good  hunting  ground). 

Proof  Technique  CD 
Contradiction 

Another  proof  technique  is  known  as  “proof  by  contradiction”  and  it  can  be  a 
powerful  (and  satisfying)  approach.  Simply  put,  suppose  you  wish  to  prove  the 
implication,  “If  A,  then  B."  As  usual,  we  assume  that  A is  true,  but  we  also  make 
the  additional  assumption  that  B is  false.  If  our  original  implication  is  true,  then 
these  twin  assumptions  should  lead  us  to  a logical  inconsistency.  In  practice  we 
assume  the  negation  of  B to  be  true  (see  Proof  Technique  N).  So  we  argue  from  the 
assumptions  A and  not(B)  looking  for  some  obviously  false  conclusion  such  as  1 = 6, 
or  a set  is  simultaneously  empty  and  nonempty,  or  a matrix  is  both  nonsingular  and 
singular. 

You  should  be  careful  about  formulating  proofs  that  look  like  proofs  by  contradic- 
tion, but  really  are  not.  This  happens  when  you  assume  A and  not (B)  and  proceed 
to  give  a “normal”  and  direct  proof  that  B is  true  by  only  using  the  assumption 
that  A is  true.  Your  last  step  is  to  then  claim  that  B is  true  and  you  then  appeal  to 
the  assumption  that  not(B)  is  true,  thus  getting  the  desired  contradiction.  Instead, 
you  could  have  avoided  the  overhead  of  a proof  by  contradiction  and  just  run  with 
the  direct  proof.  This  stylistic  flaw  is  known,  quite  graphically,  as  “setting  up  the 
strawman  to  knock  him  down.” 

Here  is  a simple  example  of  a proof  by  contradiction.  There  are  direct  proofs  that 
are  just  about  as  easy,  but  this  will  demonstrate  the  point,  while  narrowly  avoiding 
knocking  down  the  straw  man. 

Theorem:  If  a and  b are  odd  integers,  then  their  product,  ab , is  odd. 

Proof:  To  begin  a proof  by  contradiction,  assume  the  hypothesis,  that  a and  b 
are  odd.  Also  assume  the  negation  of  the  conclusion,  in  this  case,  that  ab  is  even. 
Then  there  are  integers,  j,  k,  t so  that  a = 2j  + 1,  b = 2k  + 1,  ab  = 21.  Then 

0 = ab  — ab 

= (2j  + l)(2k  + l)-(2t) 

= Ajk  + 2j  + 2k-2£+l 
= 2 (2  jk  + j + k-£)  + 1 


Again,  we  do  not  offer  this  example  as  the  best  proof  of  this  fact  about  even  and 
odd  numbers,  but  rather  it  is  a simple  illustration  of  a proof  by  contradiction.  You 
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can  find  examples  of  proofs  by  contradiction  in 

Theorem  RREFU,  Theorem  NMUS,  Theorem  NPNT,  Theorem  TTMI,  Theorem 
GSP,  Theorem  ELIS,  Theorem  EDYES,  Theorem  EMHE,  Theorem  EDELI,  and 
Theorem  DMFE,  in  addition  to  several  examples  and  solutions  to  exercises. 

Proof  Technique  U 
Uniqueness 

A theorem  will  sometimes  claim  that  some  object,  having  some  desirable  property, 
is  unique.  In  other  words,  there  should  be  only  one  such  object.  To  prove  this,  a 
standard  technique  is  to  assume  there  are  two  such  objects  and  proceed  to  analyze 
the  consequences.  The  end  result  may  be  a contradiction  (Proof  Technique  CD),  or 
the  conclusion  that  the  two  allegedly  different  objects  really  are  equal. 

Proof  Technique  ME 
Multiple  Equivalences 

A very  specialized  form  of  a theorem  begins  with  the  statement  “The  following  are 
equivalent. . . ,”  which  is  then  followed  by  a list  of  statements.  Informally,  this  lead-in 
sometimes  gets  abbreviated  by  “TFAE.”  This  formulation  means  that  any  two  of  the 
statements  on  the  list  can  be  connected  with  an  “if  and  only  if”  to  form  a theorem. 
So  if  the  list  has  n statements  then,  there  are  n^n~1^  possible  equivalences  that  can 
be  constructed  (and  are  claimed  to  be  true). 

Suppose  a theorem  of  this  form  has  statements  denoted  as  A,  B,  C,  . . . , Z.  To 
prove  the  entire  theorem,  we  can  prove  A =>  B,  B =>  C,  C =>  D,  . . . , Y =>  Z and 
finally,  Z =>  A.  This  circular  chain  of  n equivalences  would  allow  us,  logically,  if 
not  practically,  to  form  any  one  of  the  n^'\  ^ possible  equivalences  by  chasing  the 
equivalences  around  the  circle  as  far  as  required. 

Proof  Technique  PI 
Proving  Identities 

Many  theorems  have  conclusions  that  say  two  objects  are  equal.  Perhaps  one  object 
is  hard  to  compute  or  understand,  while  the  other  is  easy  to  compute  or  understand. 
This  would  make  for  a pleasing  theorem.  Whether  the  result  is  pleasing  or  not, 
we  take  the  same  approach  to  formulate  a proof.  Sometimes  we  need  to  employ 
specialized  notions  of  equality,  such  as  Definition  SE  or  Definition  CVE,  but  in  other 
cases  we  can  string  together  a list  of  equalities. 

The  wrong  way  to  prove  an  identity  is  to  begin  by  writing  it  down  and  then 
beating  on  it  until  it  reduces  to  an  obvious  identity.  The  first  flaw  is  that  you  would 
be  writing  down  the  statement  you  wish  to  prove,  as  if  you  already  believed  it  to  be 
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true.  But  more  dangerous  is  the  possibility  that  some  of  your  maneuvers  are  not 
reversible.  Here  is  an  example.  Let  us  prove  that  3 = —3. 

3 = — 3 (This  is  a bad  start) 

32  = (— 3)2  Square  both  sides 

9 = 9 

0 = 0 Subtract  9 from  both  sides 

So  because  0 = 0 is  a true  statement,  does  it  follow  that  3 = —3  is  a true 
statement?  Nope.  Of  course,  we  did  not  really  expect  a legitimate  proof  of  3 = —3, 
but  this  attempt  should  illustrate  the  dangers  of  this  (incorrect)  approach. 

What  you  have  just  seen  in  the  proof  of  Theorem  VSPCV,  and  what  you  will 
see  consistently  throughout  this  text,  is  proofs  of  the  following  form.  To  prove  that 
A = D we  write 

A = B Theorem,  Definition  or  Hypothesis  justifying  A = B 

= C Theorem,  Definition  or  Hypothesis  justifying  B = C 

= D Theorem,  Definition  or  Hypothesis  justifying  C = D 

In  your  scratch  work  exploring  possible  approaches  to  proving  a theorem  you 
may  massage  a variety  of  expressions,  sometimes  making  connections  to  various  bits 
and  pieces,  while  some  parts  get  abandoned.  Once  you  see  a line  of  attack,  rewrite 
your  proof  carefully  mimicking  this  style. 

Proof  Technique  DC 
Decompositions 

Much  of  your  mathematical  upbringing,  especially  once  you  began  a study  of  algebra, 
revolved  around  simplifying  expressions  — combining  like  terms,  obtaining  common 
denominators  so  as  to  add  fractions,  factoring  in  order  to  solve  polynomial  equations. 
However,  as  often  as  not,  we  will  do  the  opposite.  Many  theorems  and  techniques 
will  revolve  around  taking  some  object  and  “decomposing”  it  into  some  combination 
of  other  objects,  ostensibly  in  a more  complicated  fashion.  When  we  say  something 
can  “be  written  as”  something  else,  we  mean  that  the  one  object  can  be  decomposed 
into  some  combination  of  other  objects.  This  may  seem  unnatural  at  first,  but  results 
of  this  type  will  give  us  insight  into  the  structure  of  the  original  object  by  exposing 
its  inner  workings.  An  appropriate  analogy  might  be  stripping  the  wallboards  away 
from  the  interior  of  a building  to  expose  the  structural  members  supporting  the 
whole  building. 

Perhaps  you  have  studied  integral  calculus,  or  a pre-calculus  course,  where  you 
learned  about  partial  fractions.  This  is  a technique  where  a fraction  of  two  polynomials 
is  decomposed  (written  as,  expressed  as)  a sum  of  simpler  fractions.  The  purpose  in 
calculus  is  to  make  finding  an  antiderivative  simpler.  For  example,  you  can  verify 
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the  truth  of  the  expression 

12a;5  + 2xa  - 20x2 3  + 66x2  - 294x  + 308  5a: + 2 3a:  - 7 3 1 

a:6  + a;5  — 3a;4  + 21a:3  — 52a:2  + 20a:  — 48  a:2— a:  + 6^a:2  + l^"a:  + 4”*"a:  — 2 

In  an  early  course  in  algebra,  you  might  be  expected  to  combine  the  four  terms 
on  the  right  over  a common  denominator  to  create  the  “simpler”  expression  on 
the  left.  Going  the  other  way,  the  partial  fraction  technique  would  allow  you  to 
systematically  decompose  the  fraction  of  polynomials  on  the  left  into  the  sum  of  the 
four  (arguably)  simpler  fractions  of  polynomials  on  the  right. 

This  is  a major  shift  in  thinking,  so  come  back  here  often,  especially  when  we 
say  “can  be  written  as” , or  “can  be  expressed  as,”  or  “can  be  decomposed  as.” 


Proof  Technique  I 
Induction 


“Induction”  or  “mathematical  induction”  is  a framework  for  proving  statements  that 
are  indexed  by  integers.  In  other  words,  suppose  you  have  a statement  to  prove  that 
is  really  multiple  statements,  one  for  n = 1,  another  for  n = 2,  a third  for  n = 3, 
and  so  on.  If  there  is  enough  similarity  between  the  statements,  then  you  can  use  a 
script  (the  framework)  to  prove  them  all  at  once. 


For  example,  consider  the  theorem:  l + 2 + 3 + --  - + n = 

_ Ui+i) 


i(n  + 1) 


for  n > 1. 


This  is  shorthand  for  the  many  statements  1 = 1'42I~~L'1 , 1 + 2 = , 1 + 2 + 3 = 


2 

2(2+1) 

2 

3^32t~1^ , l + 2 + 3 + 4=  4^42i~1^  , and  so  on.  Forever.  You  can  do  the  calculations  in 
each  of  these  statements  and  verify  that  all  four  are  true.  We  might  not  be  surprised 
to  learn  that  the  fifth  statement  is  true  as  well  (go  ahead  and  check).  However,  do 
we  think  the  theorem  is  true  for  n = 872 ? Or  n = 1,  234,  529? 

To  see  that  these  questions  are  not  so  ridiculous,  consider  the  following  example 
from  Rotman’s  Journey  into  Mathematics.  The  statement  “n2  — n + 41  is  prime”  is 
true  for  integers  1 < n < 40  (check  a few).  However,  when  we  check  n = 41  we  find 
412  — 41  + 41  = 412,  which  is  not  prime. 

So  how  do  we  prove  infinitely  many  statements  all  at  once?  More  formally,  let  us 
denote  our  statements  as  P(n).  Then,  if  we  can  prove  the  two  assertions 


1.  P(l)  is  true. 

2.  If  P(k)  is  true,  then  P(k  + 1)  is  true. 


then  it  follows  that  P(n ) is  true  for  all  n > 1.  To  understand  this,  I liken  the 
process  to  climbing  an  infinitely  long  ladder  with  equally  spaced  rungs.  Confronted 
with  such  a ladder,  suppose  I tell  you  that  you  are  able  to  step  up  onto  the  first 
rung,  and  if  you  are  on  any  particular  rung,  then  you  are  capable  of  stepping  up  to 
the  next  rung.  It  follows  that  you  can  climb  the  ladder  as  far  up  as  you  wish.  The 
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first  formal  assertion  above  is  akin  to  stepping  onto  the  first  rung,  and  the  second 
formal  assertion  is  akin  to  assuming  that  if  you  are  on  any  one  rung  then  you  can 
always  reach  the  next  rung. 

In  practice,  establishing  that  P(  1)  is  true  is  called  the  “base  case”  and  in 
most  cases  is  straightforward.  Establishing  that  P(k)  =>■  P(k  + 1)  is  referred  to 
as  the  “induction  step,”  or  in  this  book  (and  elsewhere)  we  will  typically  refer  to 
the  assumption  of  P(k)  as  the  “induction  hypothesis.”  This  is  perhaps  the  most 
mysterious  part  of  a proof  by  induction,  since  we  are  eventually  trying  to  prove  that 
P(n)  is  true  and  it  appears  we  do  this  by  assuming  what  we  are  trying  to  prove 
(when  we  assume  P{k)).  We  are  trying  to  prove  the  truth  of  P(n)  (for  all  n),  but  in 
the  induction  step  we  establish  the  truth  of  an  implication,  P(k)  =>  P(k  + 1),  an 
“if-then”  statement.  Sometimes  it  is  even  worse,  since  as  you  get  more  comfortable 
with  induction,  we  often  do  not  bother  to  use  a different  letter  (k)  for  the  index  (n) 
in  the  induction  step.  Notice  that  the  second  formal  assertion  never  says  that  P(k) 
is  true,  it  simply  says  that  if  P(k)  were  true,  what  might  logically  follow.  We  can 
establish  statements  like  “If  I lived  on  the  moon,  then  I could  pole-vault  over  a bar 
12  meters  high.”  This  may  be  a true  statement,  but  it  does  not  say  we  live  on  the 
moon,  and  indeed  we  may  never  live  there. 

Enough  generalities.  Let  us  work  an  example  and  prove  the  theorem  above  about 

yi  -f-  1 ) 

sums  of  integers.  Formally,  our  statement  is  P(n)  : l + 2 + 3 + --  --|-n  = . 

Proof:  Base  Case.  P(l)  is  the  statement  1 = l'  l.2h1-1 , which  we  see  simplifies  to 
the  true  statement  1 = 1. 

Induction  Step:  We  will  assume  P(fc)  is  true,  and  will  try  to  prove  P(k  + 1). 
Given  what  we  want  to  accomplish,  it  is  natural  to  begin  by  examining  the  sum  of 
the  first  k + 1 integers. 


We  then  recognize  the  two  ends  of  this  chain  of  equalities  as  P(k  + 1).  So,  by 
mathematical  induction,  the  theorem  is  true  for  all  n. 

How  do  you  recognize  when  to  use  induction?  The  first  clue  is  a statement  that 
is  really  many  statements,  one  for  each  integer.  The  second  clue  would  be  that  you 
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begin  a more  standard  proof  and  you  find  yourself  using  words  like  “and  so  on”  (as 
above!)  or  lots  of  ellipses  (dots)  to  establish  patterns  that  you  are  convinced  continue 
on  and  on  forever.  However,  there  are  many  minor  instances  where  induction  might 
be  warranted  but  we  do  not  bother. 

Induction  is  important  enough,  and  used  often  enough,  that  it  appears  in  various 
variations.  The  base  case  sometimes  begins  with  n = 0,  or  perhaps  an  integer  greater 
than  n.  Some  formulate  the  induction  step  as  P(k  — 1)  =>■  P(k).  There  is  also  a 
“strong  form”  of  induction  where  we  assume  all  of  P{  1),  P( 2),  P( 3),. . . P(k ) as  a 
hypothesis  for  showing  the  conclusion  P(k  + 1). 

You  can  find  examples  of  induction  in  the  proofs  of  Theorem  GSP,  Theorem 
DER,  Theorem  DT,  Theorem  DIM,  Theorem  EOMP,  Theorem  DCP,  and  Theorem 
UTMR. 

Proof  Technique  P 
Practice 

Here  is  a technique  used  by  many  practicing  mathematicians  when  they  are  teaching 
themselves  new  mathematics.  As  they  read  a textbook,  monograph  or  research  article, 
they  attempt  to  prove  each  new  theorem  themselves,  before  reading  the  proof.  Often 
the  proofs  can  be  very  difficult,  so  it  is  wise  not  to  spend  too  much  time  on  each. 
Maybe  limit  your  losses  and  try  each  proof  for  10  or  15  minutes.  Even  if  the  proof 
is  not  found,  it  is  time  well-spent.  You  become  more  familiar  with  the  definitions 
involved,  and  the  hypothesis  and  conclusion  of  the  theorem.  When  you  do  work 
through  the  proof,  it  might  make  more  sense,  and  you  will  gain  added  insight  about 
just  how  to  construct  a proof. 

Proof  Technique  LC 
Lemmas  and  Corollaries 

Theorems  often  go  by  different  titles.  Two  of  the  most  popular  being  “lemma” 
and  “corollary.”  Before  we  describe  the  fine  distinctions,  be  aware  that  lemmas, 
corollaries,  propositions,  claims  and  facts  are  all  just  theorems.  And  every  theorem 
can  be  rephrased  as  an  “if-then”  statement,  or  perhaps  a pair  of  “if-then”  statements 
expressed  as  an  equivalence  (Proof  Technique  E). 

A lemma  is  a theorem  that  is  not  too  interesting  in  its  own  right,  but  is  important 
for  proving  other  theorems.  It  might  be  a generalization  or  abstraction  of  a key 
step  of  several  different  proofs.  For  this  reason  you  often  hear  the  phrase  “technical 
lemma”  though  some  might  argue  that  the  adjective  “technical”  is  redundant. 

A corollary  is  a theorem  that  follows  very  easily  from  another  theorem.  For  this 
reason,  corollaries  frequently  do  not  have  proofs.  You  are  expected  to  easily  and 
quickly  see  how  a previous  theorem  implies  the  corollary. 
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A proposition  or  fact  is  really  just  a codeword  for  a theorem.  A claim  might  be 
similar,  but  some  authors  like  to  use  claims  within  a proof  to  organize  key  steps.  In 
a similar  manner,  some  long  proofs  are  organized  as  a series  of  lemmas. 

In  order  to  not  confuse  the  novice,  we  have  just  called  all  our  theorems  theorems. 
It  is  also  an  organizational  convenience.  With  only  theorems  and  definitions,  the 
theoretical  backbone  of  the  course  is  laid  bare  in  the  two  lists  of  Definitions  and 
Theorems. 


Archetypes 

This  section  contains  definitions  and  capsule  summaries  for  each  archetypical  example. 
Comprehensive  and  detailed  analysis  of  each  can  be  found  in  the  online  supplement. 

Archetype  A Linear  system  of  three  equations,  three  unknowns.  Singular  coef- 
ficient matrix  with  dimension  1 null  space.  Integer  eigenvalues  and  a degenerate 
eigenspace  for  coefficient  matrix. 

Xi  — x2  + 2x3  = 1 
2xi  + X2  + x3  = 8 
xi  + x2  = 5 

Archetype  B System  with  three  equations,  three  unknowns.  Nonsingular  coeffi- 
cient matrix.  Distinct  integer  eigenvalues  for  coefficient  matrix. 

— 7 a;  i — 6x2  — 12x3  = —33 
5xi  + 5^2  + 7x3  = 24 
Xi  + 4x3  = 5 


Archetype  C System  with  three  equations,  four  variables.  Consistent.  Null  space 
of  coefficient  matrix  has  dimension  1. 

2xi  — 3x’2  + x3  — 6x4  = —7 
4xi  T X2  H-  2x3  H-  9x4  — — 1 
3xi  + X2  + x3  + 8x4  = —8 

Archetype  D System  with  three  equations,  four  variables.  Consistent.  Null  space 
of  coefficient  matrix  has  dimension  2.  Coefficient  matrix  identical  to  that  of  Archetype 
E,  vector  of  constants  is  different. 

2xi  + X2  + 7x3  — 7x4  = 8 
— 3xi  + 4x2  — 5x3  — 6x4  = —12 
x\  + X2  + 4x3  — 5x4  = 4 


Archetype  E System  with  three  equations,  four  variables.  Inconsistent.  Null  space 
of  coefficient  matrix  has  dimension  2.  Coefficient  matrix  identical  to  that  of  Archetype 
D,  constant  vector  is  different. 

2xi  + X2  + 7x3  — 7x4  = 2 
— 3xi  + 4x2  — 5x3  — 6x4  = 3 
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x\+  X2+  4x3  — 5x4  = 2 


Archetype  F System  with  four  equations,  four  variables.  Nonsingular  coefficient 
matrix.  Integer  eigenvalues,  one  has  “high”  multiplicity. 

33xi  — 16x2  + 10x3  — 2x4  = —27 
99xi  — 47x2  + 27x3  — 7x4  = —77 
78xi  — 36x2  + 17x3  — 6x4  = —52 
— 9xi  + 2x2  + 3x3  + 4x4  = 5 


Archetype  G System  with  five  equations,  two  variables.  Consistent.  Null  space  of 
coefficient  matrix  has  dimension  0.  Coefficient  matrix  identical  to  that  of  Archetype 
H,  constant  vector  is  different. 

2xi  + 3x2  = 6 

— xi  + 4x2  = -14 
3xi  10x2  = — 2 
3xi  — X2  = 20 
6x1  + 9x2  = 18 


Archetype  H System  with  five  equations,  two  variables.  Inconsistent,  overdeter- 
mined. Null  space  of  coefficient  matrix  has  dimension  0.  Coefficient  matrix  identical 
to  that  of  Archetype  G,  constant  vector  is  different. 

2x’i  + 3x’2  = 5 
— Xi  + 4x2  = 6 
3xi  A 10x2  = 2 
3xi  — X2  = — 1 
6x1  + 9x2  = 3 

Archetype  I System  with  four  equations,  seven  variables.  Consistent.  Null  space 
of  coefficient  matrix  has  dimension  4. 

Xi  + 4x’2  — X4  + 7x6  — 9x7  = 3 
2xi  + 8x2  — x3  + 3x4  + 9x5  — 13x6  + 7x7  = 9 
2x3  — 3x4  — 4x5  + 12x6  — 8x7  = 1 
— xi  — 4x2  + 2x3  + 4x4  + 8x5  — 31x6  + 37x7  = 4 


Archetype  J System  with  six  equations,  nine  variables.  Consistent.  Null  space  of 
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coefficient  matrix  has  dimension  5. 

Xi  A 2x2  — 2x3  A 9x4  + 3x5  — 5x6  — 2x7  + Xs  + 27x9  = —5 
2xt  + 4x’2  + 3x3  + 4x4  — X5  + 4x6  + IOX7  + 2x8  — 23x9  = 18 
X\  A 2x2  4“  ^3  A 3x4  4~  X5  A X6  A 6x7  A 2x8  — 7xg  — 6 
2xt  A 4x2  A 3x3  A 4x4  — 7xs  A 2x6  + 4x7  ~ llxg  = 20 
xi  A 2x2  A 5x4  A 2x5  — 4x6  + 3x7  + 8x’s  A 13xg  = —4 
— 3xi  — 6x2  — X3  — 13x4  A 2x5  — 5x6  ~ 4x7  + 13xs  A 10xg  = —29 


Archetype  K Square  matrix  of  size  5.  Nonsingular.  3 distinct  eigenvalues,  2 of 
multiplicity  2. 


' 10 

18 

24 

24 

-12" 

12 

-2 

-6 

0 

-18 

30 

-21 

-23 

-30 

39 

27 

30 

36 

37 

-30 

. 18 

24 

30 

30 

-20 

Archetype  L 

Square  matrix  of  size 

5.  Singular, 

nullity  2.  2 distinct  eigenvalues, 

each  of  “high” 

multiplicity. 

'-2 

-1 

-2 

-4 

4 ' 

-6 

-5 

-4 

-4 

6 

10 

7 

7 

10  - 

-13 

-7 

-5 

-6 

-9 

10 

-4 

-3 

-4 

-6 

6 

Archetype  M Linear  transformation  with  bigger  domain  than  codomain,  so  it  is 
guaranteed  to  not  be  injective.  Happens  to  not  be  surjective. 


( 

"xf 

> 

x2 

"Xi  A 2x2  A 3x3  A 4x4  A 4x5" 

T 

X3 

= 

3xi  A X2  A 4x3  — 3x4  A 7xs 

X4 

Xi  — X2  — 5x4  A X5  j 

\ 

_x5_ 

) 

Archetype  N Linear  transformation  with  domain  larger  than  its  codomain,  so  it 
is  guaranteed  to  not  be  injective.  Happens  to  be  onto. 


/ 

’xf 

> 

X2 

"2xi  A X2  A 3x3  — 4x4  A 5xs" 

X3 

= 

Xi  — 2x2  A 3x3  — 9x4  A 3xs 

X4 

3xi  A 4x3  ~ 6x4  A 5x5 

V 

_x5_ 

J 

T:  C5->C3,  T 
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Archetype  O Linear  transformation  with  a domain  smaller  than  the  codomain, 
so  it  is  guaranteed  to  not  be  onto.  Happens  to  not  be  one-to-one. 


T:  C3  ->  C5, 


' —xi  A x2  - 3x3  ' 
—X\  A 2x2  - 4x3 
XI  + X2  + X3 

2xi  A 3x2  A x3 
X\  A 2x3 


Archetype  P Linear  transformation  with  a domain  smaller  that  its  codomain,  so 
it  is  guaranteed  to  not  be  surjective.  Happens  to  be  injective. 


T: 


T 


— Xi  + x2  + x3 

"xf 

\ 

— xi  + 2x2  + 2x3 

X2 

= 

x\  A x2  + 3x3 

x3_ 

J 

2xi  A 3x2  A £3 

2xi  A X2  A 3x3 

Archetype  Q Linear  transformation  with  equal-sized  domain  and  codomain,  so 
it  has  the  potential  to  be  invertible,  but  in  this  case  is  not.  Neither  injective  nor 
surjective.  Diagonalizable,  though. 


( 

x{ 

\ 

— 2xi  A 3x2  A 3x3  — 6x4  A 3xs 

X2 

— 16x4  A 9x2  A I2X3  — 28x4  A 28x5 

x3 

= 

— 19xi  A 7x2  A 14x3  — 32x4  A 37xs 

X4 

— 21xi  A 9x2  A 15x3  — 35x4  A 39xs 

V 

*5. 

) 

— 9xi  A 5x2  A 7x3  — 16x4  A 16x5 

Archetype  R Linear  transformation  with  equal-sized  domain  and  codomain.  In- 
jective, surjective,  invertible,  diagonalizable,  the  works. 


/ 

x{ 

\ 

— 65xi  A 128x2  A IOX3  — 262x4  A 40xs 

X2 

36xi  — 73x2  — X3  A 151x4  — 16x5 

x3 

= 

— 44xi  A 88x2  A 5x3  ~ I8OX4  A 24xs 

X4 

34xi  — 68x2  — 3x3  + 140x’4  — I8X5 

V 

*5. 

) 

12xi  — 24x2  — X3  A 49x4  — 5xs 

Archetype  S Domain  is  column  vectors,  codomain  is  matrices.  Domain  is  dimen- 
sion 3 and  codomain  is  dimension  4.  Not  injective,  not  surjective. 


T:  C3  -A  M22, 


T 


( 

~a 

b 

a — b 2a  A 26  A c 

{ 

c_ 

) 

3a  A 6 A c —2a  — 66  — 2c 

Archetype  T Domain  and  codomain  are  polynomials.  Domain  has  dimension  5, 
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while  codomain  has  dimension  6.  Is  injective,  can  not  be  surjective. 

T:  P4  -A  P5,  T ( p{x ))  = (x  - 2 )p(x) 


Archetype  U Domain  is  matrices,  codomain  is  column  vectors.  Domain  has 
dimension  6,  while  codomain  has  dimension  4.  Cannot  be  injective,  is  surjective. 


cl  H-  2 b + 12c  — 3d  c -K  Qf 

( 

a b c 

\ 

2 a — b — c + d — 11/ 

d e f 

)- 

a -\-  b -\-  7c  -b  2d  H-  e — 3 f 

a + 2b+  12c  + 5e  — 5/ 

Archetype  V Domain  is  polynomials,  codomain  is  matrices.  Both  domain  and 
codomain  have  dimension  4.  Injective,  surjective,  invertible.  Square  matrix  represen- 
tation, but  domain  and  codomain  are  unequal,  so  no  eigenvalue  information. 


T : P3  — > M22,  T (a  + bx  + cx 2 + dx 3) 


a + b a — 2c 
d b-d 


Archetype  W Domain  is  polynomials,  codomain  is  polynomials.  Domain  and 
codomain  both  have  dimension  3.  Injective,  surjective,  invertible,  3 distinct  eigenval- 
ues, diagonalizable. 

T:  P2^P2, 

T (a  + bx  + cx2)  = (19a  + 6b  — 4c)  + (—24a  — 7b  + 4c)  x + (36a  + 126  — 9c)  x2 


Archetype  X Domain  and  codomain  are  square  matrices.  Domain  and  codomain 
both  have  dimension  4.  Not  injective,  not  surjective,  not  invertible,  3 distinct  eigen- 
values, diagonalizable. 

T : M22  — > M22,  T 


a b 
c d 


— 2a  + 15b  + 3c  + 27d  106  + 6c  + 18d 

a — 56  — 9c?  —a  — 46  — 5c  — 8d 
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by  some  word  processors  for  output  purposes  only. 

The  “Title  Page”  means,  for  a printed  book,  the  title  page  itself,  plus  such  following 
pages  as  are  needed  to  hold,  legibly,  the  material  this  License  requires  to  appear  in  the  title 
page.  For  works  in  formats  which  do  not  have  any  title  page  as  such,  “Title  Page”  means 
the  text  near  the  most  prominent  appearance  of  the  work’s  title,  preceding  the  beginning 
of  the  body  of  the  text. 

The  “publisher”  means  any  person  or  entity  that  distributes  copies  of  the  Document 
to  the  public. 

A section  “Entitled  XYZ”  means  a named  subunit  of  the  Document  whose  title 
either  is  precisely  XYZ  or  contains  XYZ  in  parentheses  following  text  that  translates 
XYZ  in  another  language.  (Here  XYZ  stands  for  a specific  section  name  mentioned  below, 
such  as  “Acknowledgements”,  “Dedications”,  “Endorsements”,  or  “History”.)  To 
“Preserve  the  Title”  of  such  a section  when  you  modify  the  Document  means  that  it 
remains  a section  “Entitled  XYZ”  according  to  this  definition. 

The  Document  may  include  Warranty  Disclaimers  next  to  the  notice  which  states  that 
this  License  applies  to  the  Document.  These  Warranty  Disclaimers  are  considered  to  be 
included  by  reference  in  this  License,  but  only  as  regards  disclaiming  warranties:  any  other 
implication  that  these  Warranty  Disclaimers  may  have  is  void  and  has  no  effect  on  the 
meaning  of  this  License. 
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2.  VERBATIM  COPYING 


You  may  copy  and  distribute  the  Document  in  any  medium,  either  commercially  or 
noncommercially,  provided  that  this  License,  the  copyright  notices,  and  the  license  notice 
saying  this  License  applies  to  the  Document  are  reproduced  in  all  copies,  and  that  you 
add  no  other  conditions  whatsoever  to  those  of  this  License.  You  may  not  use  technical 
measures  to  obstruct  or  control  the  reading  or  further  copying  of  the  copies  you  make  or 
distribute.  However,  you  may  accept  compensation  in  exchange  for  copies.  If  you  distribute 
a large  enough  number  of  copies  you  must  also  follow  the  conditions  in  section  3. 

You  may  also  lend  copies,  under  the  same  conditions  stated  above,  and  you  may  publicly 
display  copies. 


3.  COPYING  IN  QUANTITY 

If  you  publish  printed  copies  (or  copies  in  media  that  commonly  have  printed  covers)  of 
the  Document,  numbering  more  than  100,  and  the  Document’s  license  notice  requires  Cover 
Texts,  you  must  enclose  the  copies  in  covers  that  carry,  clearly  and  legibly,  all  these  Cover 
Texts:  Front-Cover  Texts  on  the  front  cover,  and  Back-Cover  Texts  on  the  back  cover.  Both 
covers  must  also  clearly  and  legibly  identify  you  as  the  publisher  of  these  copies.  The  front 
cover  must  present  the  full  title  with  all  words  of  the  title  equally  prominent  and  visible. 
You  may  add  other  material  on  the  covers  in  addition.  Copying  with  changes  limited  to 
the  covers,  as  long  as  they  preserve  the  title  of  the  Document  and  satisfy  these  conditions, 
can  be  treated  as  verbatim  copying  in  other  respects. 

If  the  required  texts  for  either  cover  are  too  voluminous  to  fit  legibly,  you  should  put 
the  first  ones  listed  (as  many  as  fit  reasonably)  on  the  actual  cover,  and  continue  the  rest 
onto  adjacent  pages. 

If  you  publish  or  distribute  Opaque  copies  of  the  Document  numbering  more  than  100, 
you  must  either  include  a machine-readable  Transparent  copy  along  with  each  Opaque  copy, 
or  state  in  or  with  each  Opaque  copy  a computer-network  location  from  which  the  general 
network-using  public  has  access  to  download  using  public-standard  network  protocols  a 
complete  Transparent  copy  of  the  Document,  free  of  added  material.  If  you  use  the  latter 
option,  you  must  take  reasonably  prudent  steps,  when  you  begin  distribution  of  Opaque 
copies  in  quantity,  to  ensure  that  this  Transparent  copy  will  remain  thus  accessible  at  the 
stated  location  until  at  least  one  year  after  the  last  time  you  distribute  an  Opaque  copy 
(directly  or  through  your  agents  or  retailers)  of  that  edition  to  the  public. 

It  is  requested,  but  not  required,  that  you  contact  the  authors  of  the  Document  well 
before  redistributing  any  large  number  of  copies,  to  give  them  a chance  to  provide  you  with 
an  updated  version  of  the  Document. 

4.  MODIFICATIONS 


You  may  copy  and  distribute  a Modified  Version  of  the  Document  under  the  conditions 
of  sections  2 and  3 above,  provided  that  you  release  the  Modified  Version  under  precisely 
this  License,  with  the  Modified  Version  filling  the  role  of  the  Document,  thus  licensing 
distribution  and  modification  of  the  Modified  Version  to  whoever  possesses  a copy  of  it.  In 
addition,  you  must  do  these  things  in  the  Modified  Version: 
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A.  Use  in  the  Title  Page  (and  on  the  covers,  if  any)  a title  distinct  from  that  of  the 
Document,  and  from  those  of  previous  versions  (which  should,  if  there  were  any,  be 
listed  in  the  History  section  of  the  Document).  You  may  use  the  same  title  as  a 
previous  version  if  the  original  publisher  of  that  version  gives  permission. 

B.  List  on  the  Title  Page,  as  authors,  one  or  more  persons  or  entities  responsible  for 
authorship  of  the  modifications  in  the  Modified  Version,  together  with  at  least  five 
of  the  principal  authors  of  the  Document  (all  of  its  principal  authors,  if  it  has  fewer 
than  five),  unless  they  release  you  from  this  requirement. 

C.  State  on  the  Title  page  the  name  of  the  publisher  of  the  Modified  Version,  as  the 
publisher. 

D.  Preserve  all  the  copyright  notices  of  the  Document. 

E.  Add  an  appropriate  copyright  notice  for  your  modifications  adjacent  to  the  other 
copyright  notices. 

F.  Include,  immediately  after  the  copyright  notices,  a license  notice  giving  the  public 
permission  to  use  the  Modified  Version  under  the  terms  of  this  License,  in  the  form 
shown  in  the  Addendum  below. 

G.  Preserve  in  that  license  notice  the  full  lists  of  Invariant  Sections  and  required  Cover 
Texts  given  in  the  Document’s  license  notice. 

H.  Include  an  unaltered  copy  of  this  License. 

I.  Preserve  the  section  Entitled  “History”,  Preserve  its  Title,  and  add  to  it  an  item 
stating  at  least  the  title,  year,  new  authors,  and  publisher  of  the  Modified  Version  as 
given  on  the  Title  Page.  If  there  is  no  section  Entitled  “History”  in  the  Document, 
create  one  stating  the  title,  year,  authors,  and  publisher  of  the  Document  as  given 
on  its  Title  Page,  then  add  an  item  describing  the  Modified  Version  as  stated  in  the 
previous  sentence. 

J.  Preserve  the  network  location,  if  any,  given  in  the  Document  for  public  access  to 
a Transparent  copy  of  the  Document,  and  likewise  the  network  locations  given  in 
the  Document  for  previous  versions  it  was  based  on.  These  may  be  placed  in  the 
“History”  section.  You  may  omit  a network  location  for  a work  that  was  published  at 
least  four  years  before  the  Document  itself,  or  if  the  original  publisher  of  the  version 
it  refers  to  gives  permission. 

K.  For  any  section  Entitled  “Acknowledgements”  or  “Dedications”,  Preserve  the  Title 
of  the  section,  and  preserve  in  the  section  all  the  substance  and  tone  of  each  of  the 
contributor  acknowledgements  and/or  dedications  given  therein. 

L.  Preserve  all  the  Invariant  Sections  of  the  Document,  unaltered  in  their  text  and  in 
their  titles.  Section  numbers  or  the  equivalent  are  not  considered  part  of  the  section 
titles. 
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M.  Delete  any  section  Entitled  “Endorsements”.  Such  a section  may  not  be  included  in 
the  Modified  Version. 

N.  Do  not  retitle  any  existing  section  to  be  Entitled  “Endorsements”  or  to  conflict  in 
title  with  any  Invariant  Section. 

O.  Preserve  any  Warranty  Disclaimers. 

If  the  Modified  Version  includes  new  front-matter  sections  or  appendices  that  qualify  as 
Secondary  Sections  and  contain  no  material  copied  from  the  Document,  you  may  at  your 
option  designate  some  or  all  of  these  sections  as  invariant.  To  do  this,  add  their  titles  to 
the  list  of  Invariant  Sections  in  the  Modified  Version’s  license  notice.  These  titles  must  be 
distinct  from  any  other  section  titles. 

You  may  add  a section  Entitled  “Endorsements”,  provided  it  contains  nothing  but 
endorsements  of  your  Modified  Version  by  various  parties — for  example,  statements  of  peer 
review  or  that  the  text  has  been  approved  by  an  organization  as  the  authoritative  definition 
of  a standard. 

You  may  add  a passage  of  up  to  five  words  as  a Front-Cover  Text,  and  a passage  of  up 
to  25  words  as  a Back-Cover  Text,  to  the  end  of  the  list  of  Cover  Texts  in  the  Modified 
Version.  Only  one  passage  of  Front-Cover  Text  and  one  of  Back-Cover  Text  may  be  added 
by  (or  through  arrangements  made  by)  any  one  entity.  If  the  Document  already  includes  a 
cover  text  for  the  same  cover,  previously  added  by  you  or  by  arrangement  made  by  the 
same  entity  you  are  acting  on  behalf  of,  you  may  not  add  another;  but  you  may  replace 
the  old  one,  on  explicit  permission  from  the  previous  publisher  that  added  the  old  one. 

The  author(s)  and  publisher(s)  of  the  Document  do  not  by  this  License  give  permission 
to  use  their  names  for  publicity  for  or  to  assert  or  imply  endorsement  of  any  Modified 
Version. 


5.  COMBINING  DOCUMENTS 


You  may  combine  the  Document  with  other  documents  released  under  this  License, 
under  the  terms  defined  in  section  4 above  for  modified  versions,  provided  that  you  include 
in  the  combination  all  of  the  Invariant  Sections  of  all  of  the  original  documents,  unmodified, 
and  list  them  all  as  Invariant  Sections  of  your  combined  work  in  its  license  notice,  and  that 
you  preserve  all  their  Warranty  Disclaimers. 

The  combined  work  need  only  contain  one  copy  of  this  License,  and  multiple  identical 
Invariant  Sections  may  be  replaced  with  a single  copy.  If  there  are  multiple  Invariant 
Sections  with  the  same  name  but  different  contents,  make  the  title  of  each  such  section 
unique  by  adding  at  the  end  of  it,  in  parentheses,  the  name  of  the  original  author  or 
publisher  of  that  section  if  known,  or  else  a unique  number.  Make  the  same  adjustment  to 
the  section  titles  in  the  fist  of  Invariant  Sections  in  the  license  notice  of  the  combined  work. 

In  the  combination,  you  must  combine  any  sections  Entitled  “History”  in  the  various 
original  documents,  forming  one  section  Entitled  “History”;  likewise  combine  any  sections 
Entitled  “Acknowledgements” , and  any  sections  Entitled  “Dedications” . You  must  delete 
all  sections  Entitled  “Endorsements” . 
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6.  COLLECTIONS  OF  DOCUMENTS 


You  may  make  a collection  consisting  of  the  Document  and  other  documents  released 
under  this  License,  and  replace  the  individual  copies  of  this  License  in  the  various  documents 
with  a single  copy  that  is  included  in  the  collection,  provided  that  you  follow  the  rules  of 
this  License  for  verbatim  copying  of  each  of  the  documents  in  all  other  respects. 

You  may  extract  a single  document  from  such  a collection,  and  distribute  it  individually 
under  this  License,  provided  you  insert  a copy  of  this  License  into  the  extracted  document, 
and  follow  this  License  in  all  other  respects  regarding  verbatim  copying  of  that  document. 

7.  AGGREGATION  WITH  INDEPENDENT 

WORKS 


A compilation  of  the  Document  or  its  derivatives  with  other  separate  and  independent 
documents  or  works,  in  or  on  a volume  of  a storage  or  distribution  medium,  is  called  an 
“aggregate”  if  the  copyright  resulting  from  the  compilation  is  not  used  to  limit  the  legal 
rights  of  the  compilation’s  users  beyond  what  the  individual  works  permit.  When  the 
Document  is  included  in  an  aggregate,  this  License  does  not  apply  to  the  other  works  in 
the  aggregate  which  are  not  themselves  derivative  works  of  the  Document. 

If  the  Cover  Text  requirement  of  section  3 is  applicable  to  these  copies  of  the  Document, 
then  if  the  Document  is  less  than  one  half  of  the  entire  aggregate,  the  Document’s  Cover 
Texts  may  be  placed  on  covers  that  bracket  the  Document  within  the  aggregate,  or  the 
electronic  equivalent  of  covers  if  the  Document  is  in  electronic  form.  Otherwise  they  must 
appear  on  printed  covers  that  bracket  the  whole  aggregate. 

8.  TRANSLATION 


Translation  is  considered  a kind  of  modification,  so  you  may  distribute  translations  of 
the  Document  under  the  terms  of  section  4.  Replacing  Invariant  Sections  with  translations 
requires  special  permission  from  their  copyright  holders,  but  you  may  include  translations 
of  some  or  all  Invariant  Sections  in  addition  to  the  original  versions  of  these  Invariant 
Sections.  You  may  include  a translation  of  this  License,  and  all  the  license  notices  in  the 
Document,  and  any  Warranty  Disclaimers,  provided  that  you  also  include  the  original 
English  version  of  this  License  and  the  original  versions  of  those  notices  and  disclaimers.  In 
case  of  a disagreement  between  the  translation  and  the  original  version  of  this  License  or  a 
notice  or  disclaimer,  the  original  version  will  prevail. 

If  a section  in  the  Document  is  Entitled  “Acknowledgements” , “Dedications” , or  “His- 
tory”, the  requirement  (section  4)  to  Preserve  its  Title  (section  1)  will  typically  require 
changing  the  actual  title. 


9.  TERMINATION 


You  may  not  copy,  modify,  sublicense,  or  distribute  the  Document  except  as  expressly 
provided  under  this  License.  Any  attempt  otherwise  to  copy,  modify,  sublicense,  or  distribute 
it  is  void,  and  will  automatically  terminate  your  rights  under  this  License. 
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However,  if  you  cease  all  violation  of  this  License,  then  your  license  from  a particular 
copyright  holder  is  reinstated  (a)  provisionally,  unless  and  until  the  copyright  holder  explicitly 
and  finally  terminates  your  license,  and  (b)  permanently,  if  the  copyright  holder  fails  to 
notify  you  of  the  violation  by  some  reasonable  means  prior  to  60  days  after  the  cessation. 

Moreover,  your  license  from  a particular  copyright  holder  is  reinstated  permanently  if 
the  copyright  holder  notifies  you  of  the  violation  by  some  reasonable  means,  this  is  the  first 
time  you  have  received  notice  of  violation  of  this  License  (for  any  work)  from  that  copyright 
holder,  and  you  cure  the  violation  prior  to  30  days  after  your  receipt  of  the  notice. 

Termination  of  your  rights  under  this  section  does  not  terminate  the  licenses  of  parties 
who  have  received  copies  or  rights  from  you  under  this  License.  If  your  rights  have  been 
terminated  and  not  permanently  reinstated,  receipt  of  a copy  of  some  or  all  of  the  same 
material  does  not  give  you  any  rights  to  use  it. 

10.  FUTURE  REVISIONS  OF  THIS  LICENSE 


The  Free  Software  Foundation  may  publish  new,  revised  versions  of  the  GNU  Free 
Documentation  License  from  time  to  time.  Such  new  versions  will  be  similar  in  spirit  to 
the  present  version,  but  may  differ  in  detail  to  address  new  problems  or  concerns.  See 
http : //www . gnu.org/ copyleft/. 

Each  version  of  the  License  is  given  a distinguishing  version  number.  If  the  Document 
specifies  that  a particular  numbered  version  of  this  License  “or  any  later  version”  applies 
to  it,  you  have  the  option  of  following  the  terms  and  conditions  either  of  that  specified 
version  or  of  any  later  version  that  has  been  published  (not  as  a draft)  by  the  Free  Software 
Foundation.  If  the  Document  does  not  specify  a version  number  of  this  License,  you  may 
choose  any  version  ever  published  (not  as  a draft)  by  the  Free  Software  Foundation.  If  the 
Document  specifies  that  a proxy  can  decide  which  future  versions  of  this  License  can  be 
used,  that  proxy’s  public  statement  of  acceptance  of  a version  permanently  authorizes  you 
to  choose  that  version  for  the  Document. 

11.  RELICENSING 

“Massive  Multiauthor  Collaboration  Site”  (or  “MMC  Site”)  means  any  World  Wide 
Web  server  that  publishes  copyrightable  works  and  also  provides  prominent  facilities  for 
anybody  to  edit  those  works.  A public  wiki  that  anybody  can  edit  is  an  example  of  such  a 
server.  A “Massive  Multiauthor  Collaboration”  (or  “MMC”)  contained  in  the  site  means 
any  set  of  copyrightable  works  thus  published  on  the  MMC  site. 

“CC-BY-SA”  means  the  Creative  Commons  Attribution-Share  Alike  3.0  license  pub- 
lished by  Creative  Commons  Corporation,  a not-for-profit  corporation  with  a principal 
place  of  business  in  San  Francisco,  California,  as  well  as  future  copyleft  versions  of  that 
license  published  by  that  same  organization. 

“Incorporate”  means  to  publish  or  republish  a Document,  in  whole  or  in  part,  as  part 
of  another  Document. 

An  MMC  is  “eligible  for  relicensing”  if  it  is  licensed  under  this  License,  and  if  all 
works  that  were  first  published  under  this  License  somewhere  other  than  this  MMC,  and 
subsequently  incorporated  in  whole  or  in  part  into  the  MMC,  (1)  had  no  cover  texts  or 
invariant  sections,  and  (2)  were  thus  incorporated  prior  to  November  1,  2008. 


GFDL 


Beezer:  A First  Course  in  Linear  Algebra 


632 


The  operator  of  an  MMC  Site  may  republish  an  MMC  contained  in  the  site  under 
CC-BY-SA  on  the  same  site  at  any  time  before  August  1,  2009,  provided  the  MMC  is 
eligible  for  relicensing. 

ADDENDUM:  How  to  use  this  License  for  your 

documents 

To  use  this  License  in  a document  you  have  written,  include  a copy  of  the  License  in 
the  document  and  put  the  following  copyright  and  license  notices  just  after  the  title  page: 


Copyright  © YEAR  YOUR  NAME.  Permission  is  granted  to  copy,  distribute 
and/or  modify  this  document  under  the  terms  of  the  GNU  Free  Documentation 
License,  Version  1.3  or  any  later  version  published  by  the  Free  Software  Foun- 
dation; with  no  Invariant  Sections,  no  Front-Cover  Texts,  and  no  Back-Cover 
Texts.  A copy  of  the  license  is  included  in  the  section  entitled  “GNU  Free 
Documentation  License” . 


If  you  have  Invariant  Sections,  Front-Cover  Texts  and  Back-Cover  Texts,  replace  the 
“with  . . . Texts.”  line  with  this: 


with  the  Invariant  Sections  being  LIST  THEIR  TITLES,  with  the  Front-Cover 
Texts  being  LIST,  and  with  the  Back-Cover  Texts  being  LIST. 


If  you  have  Invariant  Sections  without  Cover  Texts,  or  some  other  combination  of  the 
three,  merge  those  two  alternatives  to  suit  the  situation. 

If  your  document  contains  nontrivial  examples  of  program  code,  we  recommend  releasing 
these  examples  in  parallel  under  your  choice  of  free  software  license,  such  as  the  GNU 
General  Public  License,  to  permit  their  use  in  free  software. 


